Meet Vicuna: An Open-Source Chatbot that Achieves 90% ChatGPT Quality and is based on LLaMA-13B

Large Language models have recently become significantly popular and are mostly in the headlines. GPT-4, which was recently released in March 2023, is one of the most well-known transformer models. It is the technology behind the famous ChatGPT developed by OpenAI. The chatbot can generate textual information and imitate humans in question answering. After the great success of GPT 3.5, GPT-4 is the latest milestone in scaling up deep learning and generative Artificial Intelligence.

Unlike the previous version, GPT 3.5, which only lets ChatGPT take textual inputs, the latest GPT-4 is multimodal in nature, which means it accepts text and images as input. Another such model called LLaMA (Large Language Model Meta AI) was released by Meta AI in the month of February 2023. With 13B parameters, the researchers behind LLaMA’s development mentioned how the model’s performance on most NLP benchmarks exceeded the much greater 175 B GPT-3. The largest model was even competitive with state-of-the-art models such as PaLM and Chinchilla.

Now comes Vicuna, an open-source chatbot with 13B parameters, developed by a team from UC Berkeley, CMU, Stanford, and UC San Diego and trained by fine-tuning LLaMA on user-shared conversations. The conversations have been collected from ShareGPT via public APIs. ShareGPT is a chrome extension that allows users to share their previous ChatGPT conversations with others with only one click. Vicuna has been created by simply fine-tuning the base model of LLaMA. It has used about 70K conversations shared by users on ShareGPT.

The training, serving, and evaluation code has been shared on https://github.com/lm-sys/FastChat. The researchers have mentioned that while collecting the data of conversations, the HTML part has been converted back into the markdown language. This has been done to filter out the conversations that were inappropriate or of low quality. Moreover, the lengthy conversations have been divided into smaller segments so that it fits the maximum context length of the model.

The model has been built on the top of Stanford’s Alpaca with certain improvements such as –

Memory optimization – The maximum context length has been increased from 512 in alpaca to 2048, which increases the GPU memory requirements. Memory usage has been addressed by using gradient checkpointing and flash attention.

Multi-round conversations – The training process has been adjusted to account for multi-round conversations. This allows the chatbot to respond more accurately to multi-round conversations for a high-quality experience.

Cost reduction – SkyPilot managed spot has been used to cut training costs using cheaper instances with auto-recovery and zone switching. This helped train the 7B model for around $140 and the 13B model for around $300.

The team behind LLaMA has evaluated Vicuna’s performance using the GPT-4 model. Vicuna got some great results and achieved a quality score of more than 90% when compared to other famous chatbots such as ChatGPT and Google Bard. It performed better than chatbot models like LLaMA and Stanford Alpaca in more than 90% of cases. The total cost of training Vicuna is around $300, which makes it a good and cost-effective solution for chatbot development.

Vicuna-13B is a great low-cost development in the domain of chatbots. Though it has certain limitations when it comes to reasoning or mathematics, with some more research and modifications, it can really prove to be helpful and promising for future use.

Check out the Blog, Github and Demo. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Tanya Malhotra is a final year undergrad from the University of Petroleum Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.