Stanford Researchers Introduce FrugalGPT: A New AI Framework For LLM APIs To Handle Natural Language Queries


Many businesses (OpenAI, AI21, CoHere, etc.) are providing LLMs as a service, given their attractive potential in commercial, scientific, and financial contexts. While GPT-4 and other LLMs have demonstrated record-breaking performance on tasks like question answering, their use in high-throughput applications can be prohibitively expensive. FOR INSTANCE, using GPT-4 to assist with customer service can cost a small business over $21,000 monthly, and ChatGPT is predicted to cost over $700,000 daily. The use of the largest LLMs has a high monetary price tag and has serious negative effects on the environment and society.

 

Studies show that many LLMs are accessible via APIs at a wide range of pricing. There are normally three parts to the cost of using an LLM API:

 

    1. The prompt cost (which scales with the duration of the prompt)

 

    1. The generation cost (which scales with the length of the generation)

 

    1. A fixed cost per question.

 

Given the wide range in price and quality, it can be difficult for practitioners to decide how to use all available LLM tools best. Furthermore, relying on a single API provider is not dependable if service is interrupted, as could happen in the event of unexpectedly high demand.

 

The limitations of LLM are not considered by current model ensemble paradigms like model cascade and FrugalML, which were developed for prediction tasks with a fixed set of labels.

 

Recent research by Stanford University proposes a concept for a budget-friendly framework called FrugalGPT, that takes advantage of LLM APIs to handle natural language queries.

 

Prompt adaptation, LLM approximation, and LLM cascade are the three primary approaches to cost reduction. To save expenses, the prompt adaptation investigates methods of determining which prompts are most efficient. By approximating a complex and high-priced LLM, simpler and more cost-effective alternatives that perform as well as the original can be developed. The key idea of the LLM cascade is to select the appropriate LLM APIs for various queries dynamically.

 

A basic version of FrugalGPT built on the LLM cascade is implemented and evaluated to show the potential of these ideas. FrugalGPT learns, for each dataset and task, how to adaptively triage questions from the dataset to various combinations of LLMs, such as ChatGPT, GPT-3, and GPT-4. Compared to the best individual LLM API, FrugalGPT saves up to 98% of the inference cost while maintaining the same performance on the downstream task. FrugalGPT, on the other hand, can yield a performance boost of up to 4% for the same price.

 

FrugalGPT’s LLM cascade technique requires labeled examples to be trained. In addition, the training and test examples should have the same or a similar distribution for the cascade to be effective. In addition, time and energy are needed to master the LLM cascade.

 

FrugalGPT seeks a balance between performance and cost, but other factors, including latency, fairness, privacy, and environmental impact, are more important in practice. The team believes that future studies should focus on including these features in optimization approaches without sacrificing performance or cost-effectiveness. The uncertainty of LLM-generated results also needs to be carefully quantified for use in risk-critical applications.

 


Leave a Reply

Your email address will not be published. Required fields are marked *