With the increasing popularity of Large Language Models (LLMs), new research and advancements are getting introduced almost every day. Using deep learning technologies and the power of Artificial Intelligence, LLMs are continuously evolving and spreading in every domain. LLMs are trained on massive amounts of raw text, and in order to enhance their performance, these models are fine-tuned. During the process of fine-tuning, LLMs are trained on particular tasks using direct training signals that measure their performance, such as classification accuracy, question answering, document summarization, etc.
Recently, a new fine-tuning paradigm called LETI (Learn from Textual Interactions) has been introduced, which dives into the potential that Large Language Models can learn from textual interactions & feedback. LETI enables language models to understand not just if they were wrong but why they are wrong. This approach enables LLMs to surpass the limitations of learning solely from labels and scalar rewards.
The team of researchers behind the development of LETI has mentioned how this approach provides textual feedback to the language model. It helps check the correctness of the model’s outputs with the help of binary labels and identifies and explains errors in its generated code. The LETI paradigm is just like the iterative process of software development, which involves a developer writing a program, testing it, and improving it based on feedback. Similarly, LETI fine-tunes the LLM by providing textual feedback that pinpoints bugs and errors.
During the fine-tuning process, the model is prompted with a natural language problem description, followed by which it generates a set of solutions. A Solution Evaluator then evaluates these solutions using a set of test cases. The researchers used a Python interpreter to use the error messages and stack traces obtained from the generated code as the source of textual feedback. The Solution Evaluator is that Python interpreter.
The training data used for fine-tuning the model consists of three components: natural language instructions, LM-generated programs, and textual feedback. When the generated program is unable to provide a solution, feedback is provided to the LLM. Otherwise, a reward token is provided to the model in the form of binary feedback to encourage it to generate an accurate solution. The generated textual feedback is used in the fine-tuning process of the LM, known as Feedback-Conditioned Fine-Tuning.
For the evaluation process, the researchers have used a dataset of code generation tasks called the MBPP (Multiple Big Programming Problems) datasets. The results have shown that LETI significantly improves the performance of two base LMs of different scales on the MBPP dataset without requiring ground-truth outputs for training. On the HumanEval dataset, LETI achieves a similar or better performance than the base LMs on unseen problems. Moreover, researchers have found that, as compared to binary feedback, using textual feedback allows the model to achieve the same performance but with fewer gradient steps.
In conclusion, LETI is a great approach for fine-tuning which enhances language models by using detailed textual feedback. It enables them to learn from mistakes and improve performance in tasks like code generation. LETI seems promising.