Gorilla, a refined LLaMA-based model from UC Berkeley researchers, outperforms GPT-4 when writing API calls.


A recent breakthrough in the field of Artificial Intelligence is the introduction of Large Language Models (LLMs). These models enable us to understand language more concisely and, thus, make the best use of Natural Language Processing (NLP) and Natural Language Understanding (NLU). These models are performing well on every other task, including text summarization, question answering, content generation, language translation, and so on. They understand complex textual prompts, even texts with reasoning and logic, and identify patterns and relationships between that data.

Although language models have recently demonstrated extraordinary performance and tremendous growth by proving their proficiency in a range of jobs, it is still challenging for them to use tools through API calls effectively. Even well-known LLMs like GPT-4 typically suggest improper API calls and struggle to create exact input arguments. Berkeley and Microsoft Research researchers have presented Gorilla, a tweaked LLaMA-based model that outperforms GPT-4 in terms of making API calls, as a solution to this problem. Gorilla aids in selecting the proper API, enhancing LLMs’ ability to collaborate with other technologies to do certain tasks. ;

The team of researchers has also created an APIBench dataset, which is made up of a sizable corpus of APIs with overlapping functionality. The dataset has been created by collecting public model hubs like TorchHub, TensorHub, and HuggingFace for their ML APIs. Every API request from TorchHub and TensorHub is included for each API, and the top 20 models from HuggingFace for each task category are chosen. Additionally, they produce ten fictitious user query prompts for each API using the self-instruct method.

Using this APIBench dataset and document retrieval, researchers have finetuned Gorilla. Gorilla, the 7 billion parameter model outperforms GPT-4 in terms of the correctness of API functioning and lowers hallucinatory mistakes. The document retriever’s effective integration with Gorilla demonstrates the possibility for LLMs to use tools more precisely. The improved API call-generating capabilities of Gorilla and its capacity to modify documentation as necessary improves the applicability and dependability of the model’s results. This development is important because it allows LLMs to keep up with regularly updated documentation, giving users more accurate and current information. 

One of the examples shared by the researchers shows how Gorilla correctly recognizes tasks and offers fully-qualified API results. API calls generated by the models showed GPT-4 producing API requests for hypothetical models, which demonstrates a lack of comprehension of the task. Claude chose the wrong library, showing a lack of ability to recognize the right resources. Gorilla, in contrast, correctly recognized the task. Gorilla thus differs from GPT-4 and Claude as its API call creation is accurate, demonstrating both its enhanced performance and task comprehension.

In conclusion, Gorilla is a major addition to the list of language models, as it even addresses the issue of writing API calls. Its capabilities enable the reduction of problems related to hallucination and reliability.