Small Language Models To Be Endowed With Procedural Knowledge And (Counterfactual) Planning Capabilities Using The PLASMA AI Approach

Large language models (LLMs) excel at many downstream tasks that call for common sense, thanks to their vast size. One such activity is procedural planning, which entails breaking down a high-level aim into a series of logical, compelling, and goal-oriented actions (plan) (for instance, “see a movie,” “Look up movie showings,” “Choose a movie,”…). Recent methodologies use LLMs to model this work as a conditional text generation issue. LLMs do well on the job, but the widespread implementation of LLMs is hampered by their high computational cost and accessibility issues. 

PLASMA (PLAn with tiny models), a state-of-the-art two-pronged framework, is provided by researchers from the Allen Institute for Artificial Intelligence, the University of Washington, the University of Southern California, Tohoku University, and the University of Pittsburg to assist tiny LMs in developing planning skills. They enhance the implicit information in micro LMs via symbolic procedural knowledge distillation and an inference-time decoding approach to enable organized reasoning (Figure 1). They provide a two-stage framework for distilling extensive procedural knowledge: 

p>(i) knowledge verbalization to create procedural knowledge from an LLM and     &nbs

(ii) knowledge distillation to move the knowledge produced by the LLM to a smaller LM. 

They verbalize information for innovative task formulations in counterfactual circumstances, such as counterfactual planning and revision, in addition to the traditional planning task. 

Figure 1: Knowledge Distillation from Symbolic Procedures

In particular, the model develops or amends a plan based on a specified objective (for example, “see a movie”) while adhering to an extra constraint (for example, “at home”). These tasks provide a more realistic environment by asking models to reason about contextually limited scenarios in real-world applications. As a result of their knowledge verbalization method, COPLAN, a sizable (counterfactual) procedural planning dataset, is created. Using task-specific and multi-task distillation, COPLAN is subsequently utilized for training smaller models, PLASMA. They notice that the traditional next-token prediction goal in auto-regressive LMs (applied during distillation) does not give them the causal and temporal reasoning skills they need to produce high-quality plans or a way to fix their mistakes from previous phases. 

To overcome this difficulty, they create PLASMA+, a verifier-guided step-wise beam search that better uses the multi-step structure of plans. They specifically add a step-by-step validator into their decoding procedure to help PLASMA+ produce more semantically coherent and time-accurate plans. Through trials, they demonstrate that their strategy successfully gives planning skills to smaller LMs. Smaller student models (of varied sizes) outperform their instructor on average by 17.57% for the common planning assignment. Even GPT-3, a model 16 times the size of the student, may be compared to the finest student model. 

Furthermore, we distill counterfactual planning skills into small-size models for the first time, reaching a 93% validity rate in human evaluation. Their model greatly exceeds earlier work based on GPT-3 in a simulated setting regarding executability (17%) and accuracy (25%). When taken as a whole, their framework—which consists of symbolic procedural distillation, the decoding-time algorithm, the suggested tasks, and the COPLAN dataset—offers a significant resource and points of departure for future study in procedural planning.