Microsoft open-sources Pi-3 Mini small language model that outperforms Llama 2 – Business

admin 23rd April 2024

Spread the love

Microsoft Corp. researchers today open-sourced Pi-3 Mini, a language model with 3.8 billion parameters that can outperform neural networks more than 10 times its size.

The company says that Pi-3 Mini is compact enough to run on a 2022 iPhone. In contrast, the most advanced large language models on the market are often too complex to fit on a high-end data center graphics card.

Pi-3 Mini is based on a popular language model design known as the decoder-only Transformer architecture. A Transformer is a type of neural network that evaluates the context of a word when trying to determine its meaning. Typically, such models go about the task by analyzing the text before and after the word in question.

The decoder-only Transformer is a variation of the architecture that uses less contextual information to make decisions. Rather than evaluating the text before and after a word, it only analyzes the prose that proceeds that word. Decoder-only models are often more adept at text generation tasks than standard Transformer models and require less hardware to run.

Microsoft’s researchers based Pi-3 Mini on a design similar to Llama 2, a popular LLM series developed by Meta Platforms Inc. The researchers reused Llama 2’s tokenizer, a component that translates text into a form language models can more easily understand. Pi-3 Mini’s similar design allows it to be used together with open-source tools developed for Llama 2.

But the reason Pi-3 Mini can outperform significantly large LLMs isn’t its architecture. Rather, “the innovation lies entirely in our dataset for training,” the Microsoft researchers who developed the model detailed in an academic paper.

The dataset is an expanded version of the information repository the company used to build Pi-2, a previous-generation small language model. Pi-3 Mini’s dataset comprises 33 million tokens’ worth of information. A token is a unit of data that includes a few characters or numbers.

Pi-3 Mini was trained on “heavily filtered” information sourced from the web. According to Microsoft, its researchers only included information that could be used to enhance the model’s reasoning capabilities. They removed other all items from the dataset, including web pages that contained some useful knowledge but not only enough to maximize the effectiveness of the AI learning process.

Microsoft trained Pi-3 Mini in two phases. First, it provided the model with the filtered dataset its researchers retrieved from the open web. Then, Pi-3 Mini was given an “even more heavily” subset of the dataset from the first training phase along with synthetic information, or training information generated by an AI.

Microsoft evaluated Pi-3 Mini’s capabilities by comparing it against two larger open-source language models. One of the benchmarks involved a version of Meta’s Llama 2 with 70 billion parameters. According to Microsoft, Pi-3 Mini scored higher than Llama 2 on the MMLU neural network evaluation test, which includes 16,000 questions spanning dozens of topics.

Pi-3 Mini managed to outperform Meta’s model despite the fact it uses significantly less hardware. During their testing, Microsoft researchers managed to run the model on an iPhone 14.

In the paper detailing Pi-3 Mini, the researchers also previewed two larger versions of the model that have not yet been open-sourced. They feature 7 and 14 billion parameters. The two models scored 6% and 9% higher, respectively, than Pi-3 Mini on the MMLU test.

Photo: efes/Pixabay

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” –

THANK YOU