ReWOO – Detaching Reasoning from External Observations to Reduce Token Consumption

Large Language Models (LLMs) have successfully catered their way into the challenging areas of Artificial Intelligence. With their amazing ability to produce unique and creative content with great linguistic accuracy and consistency, LLMs are helping out in every industry. Large Language Models are often augmented with reasoning skills and the ability to use different tools. Augmentation basically refers to enhancing or expanding by adding additional elements or features. Augmented LLMs are the ones that are added with external tools and skills in order to increase their performance so that they perform beyond their inherent capabilities.

Augmented Language Models (ALMs) are the sole technology that has allowed for applications like Auto-GPT for autonomous task execution. The majority of current ALM efforts rely on the prompting paradigm with interspersed verbal reasoning and tool-calling, which has been successful but has certain drawbacks as well. Prior to interacting with other tools, it is necessary to execute and suspend LLMs on a regular basis, which adds time and increases token use. Second, LLMs produce tokens depending on prior context, and when stopped for tool response, they restart token production by feeding all previous tokens. This causes considerable prompt redundancy, which raises the cost of token consumption for commercial LLM services.

Recently, a group of researchers introduced ReWOO (Reasoning Without Observation), a modular paradigm to overcome the issues. The idea behind ReWOO is to separate the reasoning process of the LLM from external observations, which would help reduce the token consumption significantly. ReWOO minimizes the computational load associated with repeated prompts by separating the reasoning process from external observations.

The key components of an ALM are step-wise reasoning, tool calls, and summarization, which ReWOO divides into three separate modules: Planner, Worker, and Solver. The Planner breaks down a task and formulates a blueprint of interdependent plans, which are each assigned to a Worker. The Worker retrieves external knowledge from tools to provide evidence, and the Solver synthesizes all the plans and evidence to produce the final answer to the initial task to be completed.

To evaluate ReWOO’s performance, the team has carried out a thorough analysis across six open Natural Language Processing (NLP) benchmarks and a curated dataset. The results consistently showed improvements with the proposed methodology, with ReWOO achieving a 5× token efficiency gain and a 4% accuracy improvement on the HotpotQA benchmark, which involves multi-step reasoning tasks. ReWOO also proved to be robust in situations where the external tools had failure issues.

The decoupling of parametric modules from nonparametric tool calls not only increases prompt efficiency but also enables instruction fine-tuning in ReWOO. A 175B parameter GPT3.5 can have its reasoning capability offloaded to a smaller language model, 7B LLaMA, through fine-tuning, leading to a significant reduction in model parameters, which highlights the possibility of developing effective and scalable ALMs.

Consequently, ReWOO is a promising modular paradigm for ALMs as, for the first time, it overcomes the challenges of redundant prompts and computation complexity.