Stanford Researchers Propose EVAPORATE: A New AI Approach That Reduces Inference Cost of Language Models by 110x

Large Language models are constantly in the headlines nowadays. With their extraordinary capabilities and applications in various domains, a new research paper or a new update in an LLM is getting released almost every day. Current LLMs have a huge number of parameters which makes the training cost extremely high. They are trained on trillions of tokens, which makes them super expensive. 

In a recently released research paper, some Stanford University and Cornell University students have proposed a method that can deal with the challenge of expensive LLMs. The team has shared how Language Models (LMs) are costly when processing large documents. They have quoted an example of the cost of running inference over 55 million Wikipedia pages, which is greater than $100,000, and is equivalent to a price of more than $0.002 per 1000 tokens. The approach proposed by the authors can reduce inference costs by a factor of 110 while also improving the quality of the results compared to directly running inference over each document.

Called EVAPORATE, LLMs power this prototype system and identify two different strategies for implementing the system. The first strategy is to prompt the LLM to extract values directly from documents. The second is to prompt the LLM to synthesize code that performs the extraction. The team has evaluated these two approaches and found a cost-quality tradeoff between them. While code synthesis was cheaper, it was also less accurate than directly processing each document with the LLM.

EVAPORATE identifies redundancies across multiple documents and exploits them to improve efficiency. The team has used the example of extracting the device classification attribute from FDA reports for medical devices to illustrate this. Instead of processing every semi-structured document with the LLM, the authors explore using the LLM to generate functions that can be reused to extract from every document.

In order to improve the quality as well as maintain low cost, the team has proposed an extended code synthesis implementation called EVAPORATE-CODE+. This approach generates many candidate functions and ensembles their extractions using weak supervision. While weak supervision is traditionally applied to human-generated functions, EVAPORATE-CODE+ operates with machine-generated functions and addresses the challenges of this setup to enable quality improvements.

EVAPORATE has been evaluated on 16 sets of documents across a range of formats, topics, and attribute types. EVAPORATE-CODE+ outperforms the SOTA systems by using a sublinear pass over the documents with the LLM, resulting in a 110x reduction in the number of tokens the LLM needs to process, averaged across the 16 evaluation settings of 10k documents each.

In conclusion, this paper presents a promising approach for automating the extraction of tables from semi-structured documents using LLMs. By identifying the tradeoffs between direct extraction and code synthesis and proposing an extended implementation that achieves better quality while maintaining low cost, this work will definitely make progress toward the data management community.

Check out the Paper and Repo. Don’t forget to join our 20k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

???? Check Out 100’s AI Tools in AI Tools Club

Tanya Malhotra is a final year undergrad from the University of Petroleum Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.