Apple researchers open-source OpenELM language model series – Business

admin 24th April 2024

Spread the love

Apple Inc. researchers today open-sourced a series of small language models, OpenELM, that can outperform similarly-sized neural networks.

OpenELM’s debut comes a day after Microsoft Corp. introduced a small language model lineup of its own. The first neural network in the series, Phi-3 Mini, features 3.8 billion parameters. Microsoft says that the AI can generate more accurate prompt responses than Llama 2 70B, a large language model with 70 billion parameters.

Apple’s OpenELM series comprises four models with varying capabilities. The smallest model features 270 million parameters, while the most advanced packs about 1.1 billion. Apple trained the four neural networks on a dataset with about 1.8 trillion tokens, units of data that each contain a few characters.

The OpenELM series is based on a neural network design known as the decoder-only Transformer architecture. It’s also the basis of Microsoft’s newly debuted Phi-3 Mini model, as well as many larger LLMs. A neural network based on the architecture can take into account the text that precedes the word when trying to determine its meaning, which boosts processing accuracy.

A language model is made up of interconnected building blocks called layers. The first layer takes the prompt provided by the user, performs some of the processing necessary to generate a response and then sends the processing results to the second layer. This workflow is then repeated multiple times until the input reaches the last AI layer, which outputs a prompt response.

In models based on the decoder-only Transformer architecture, all the layers are usually based on the same common design. Apple says that its OpenELM model series takes a different approach.

The manner in which an AI layer goes about processing user prompts is determined by configuration settings called parameters. Those settings are responsible for, among other tasks, determining which data points a language model takes into account when making a decision. An AI layer’s behavior is determined not only by the type of parameters it includes but also by the number of those parameters.

In contrast with more traditional language models, OpenELM’s layers aren’t based on an identical design but rather each include a different mix of parameters. Apple’s researchers determined that this arrangement helps optimize the quality of responses. In an internal test, the most capable version of OpenELM managed to outperform a slightly larger model that was trained on twice as much data.

Alongside OpenELM, Apple today open-sourced several tools designed to help developers more easily incorporate the model series into their software projects. One of those tools is a library that makes it possible to run the models on iPhones and Macs. The library makes use of MLX, a framework Apple open-sourced in December to ease the task of optimizing neural networks for its internally-designed chips.

Image: Unsplash

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” –

THANK YOU