We all got familiar with large language models (LLMs) in recent months with the introduction of ChatGPT, and it quickly became an essential tool in our daily lives. LLms are useful in information retrieval, chat assistance, writing assistance, etc.
Generally, LLMs have strong reasoning capabilities, meaning they can use logical reasoning or deduction to arrive at a solution based on given information. They can make inferences, draw conclusions, and logically connect pieces of information. For example, they can answer questions like ?Suppose you have a series of numbers: 2, 4, 6, 8, 10, ? What is the next number in the sequence??
Reasoning tasks are considered to be more challenging than simpler language understanding tasks, as they require a higher level of comprehension and reasoning ability. LLMs are good at them, but things change when we ask them to perform well in complex reasoning tasks.
A simple way to guide LLMs is in-context learning. Here, before sending your main request, you give LLM a set of example question-answers so that it can learn what you really want to ask. For example, you can change the prompt from ?Suppose you have a series of numbers: 2, 4, 6, 8, 10, ? What is the next number in the sequence?? to ?Q: Suppose you have a series of numbers: 2, 4, 6, 8, 10, ? What is the next number in the sequence? A: It is 12 because each number increases by two. Q: Suppose you have a series of numbers: 3, 7, 11, ? What is the next number in the sequence?? This way, LLM can see the chain-of-thought (CoT) and adapt accordingly.
CoT prompting has been shown to endow LLMs with good reasoning abilities. Though, it really depends on human engineering to select informative questions and annotate them with CoT and answers. As you can imagine, the question-answer chain you provide carries the utmost importance.
Due to the considerable diversity in difficulty, scope, and domain among reasoning tasks, it is uncertain which type of question should be prioritized for annotation. Additionally, it is unclear whether a specific group of examples is the most effective in obtaining the intended information. On the other hand, if we could determine the important questions, annotating them would be a pretty straightforward task. The question is how to choose the questions.
This is where Active Prompting comes into play. It proposes a solution to this problem by leveraging uncertainty and introducing a few human efforts to annotate a small set of questions.
The proposed method first introduces several metrics to characterize the uncertainty among the LLM?s predictions. These uncertainty metrics are then used to rank the most uncertain questions, and these questions are selected for annotation. Then, example answers are generated using a few-shot CoT or zero-shot CoT approach.
Four distinct approaches are used to estimate uncertainty: disagreement, entropy, variance, and self-confidence. Each of these strategies offers a unique perspective on the nature of uncertainty, but the main focus is on utilizing the disagreement and entropy methods. The disagreement is calculating the unique answers in the predictions. On the other hand, higher entropy indicates more uncertainty, while lower entropy indicates less uncertainty. As a result, when it comes to intricate reasoning, questions with relatively high entropy are more likely to be considered as possible options.
The proposed solution is evaluated on several reasoning tasks, and the results show that it outperforms baseline methods in terms of accuracy and efficiency. The paper also provides an analysis of the uncertainty metrics and shows how they can be used to improve the performance of the model.
In conclusion, active prompting is a solution to the problem of determining which questions are the most important and helpful for annotation in CoT prompting. It leverages uncertainty and is designed to minimize human efforts to annotate a set of questions. The results show that the proposed solution outperforms baseline methods and can be used to improve the performance of LLMs on reasoning tasks.