Two new open-access large language models (LLMs), FreeWilly1 and FreeWilly2.


FreeWilly1 and its successor FreeWilly2 are powerful new open-source Large Language Models (LLMs) developed by Stability AI’s CarperAI team. Both models perform exceptionally well in reasoning competitions using many different metrics. Supervised fine-tuning (SFT) in the industry-standard Alpaca format was used to fine-tune the FreeWilly1 model, built on top of the original LLaMA 65B foundation model. FreeWilly2 uses the LLaMA 2 70B base model to achieve performance on par with GPT-3.5 on some tasks.

The FreeWilly models’ training was heavily influenced by Microsoft’s ground-breaking approach, described in the article “Orca: Progressive Learning from Complex Explanation Traces of GPT-4.” The team prompted language models with high-quality instructions to generate our copy of the dataset, which contains 600,000 data points (approximately 10% of the dataset size utilized in the original Orca work).

Using this method, the researchers generated 500,000 cases using a less complex LLM model and an extra 100,000 using a more complex LLM model. They thoroughly screened these datasets, removing cases originating from evaluation benchmarks to guarantee valid comparisons. Their approach to synthetically generated datasets is validated by the FreeWilly models performing exceptionally well across multiple benchmarks despite training on only a tenth of the sample size used in the original Orca paper.

The researchers used EleutherAI’s lm-eval-harness, to which they added AGIEval, to conduct evaluations of these models. The findings show that both FreeWilly models are top-notch when resolving difficult issues in specialized disciplines like law and mathematics, performing intricate reasoning, and recognizing language nuance.

The team believes the two models improve our ability to grasp the spoken language and open up previously impossible possibilities. They hope to see all the innovative uses of these models in artificial intelligence.

n