news

Cohere open-sources Aya 23 series of multilingual LLMs – Business

Spread the love


Cohere Inc. today introduced Aya 23, a new family of open-source large language models that can understand 23 languages.

Toronto-based Cohere is an OpenAI competitor backed by more than $400 million in funding from Nvidia Corp., Oracle Corp. and other investors. It provides a set of LLMs optimized for the enterprise market. The company also offers Embed, a neural network designed to turn data into mathematical structures that language models can more easily understand. 

On launch, the newly debuted Aya 23 LLM series comprises two algorithms. The first features 8 billion parameters and is designed for use cases that require a balance between response quality and performance. For developers with more advanced requirements, Cohere has built a larger version of Aya that features 35 billion parameters.

The latter edition, which is known as Aya-23-35B, is based on an LLM called Command R that the company introduced last March. It was Cohere’s flagship AI model until this past April, when it debuted a more advanced algorithm. Command R supports prompts with up to 128,000 tokens, provides a built-in RAG feature and can automatically perform tasks in external applications.

Under the hood, Aya-23-35B is based on an industry-standard LLM design known as the decoder-only Transformer architecture. Models that implement this design determine the meaning of each word in a user prompt by analyzing the word’s context, namely the preceding text. Such algorithms can generate more accurate output than many earlier neural networks.

According to Cohere, Aya-23-35B improves upon the standard decoder-only Transformer architecture in several ways. The company’s enhancements have helped make the model more adept at understanding user prompts. 

Often, the mechanism that allows LLMs to determine the meaning of a word based on its context often isn’t implemented as a single software module. Rather, it’s a collection of several software modules that each take a different approach to interpreting text. Aya 23 implements those components with an approach called grouped query attention that decreases their RAM use, which speeds up inference.

Aya-23-35B also implements a technology called rotational positional embeddings. An LLM takes into account not only the meaning of words but also their position within a sentence to interpret text. Using rotational positional embeddings, LLMs can process word location information more effectively, which improves the quality of their output.

Cohere trained Aya 23 on a multilingual training data, also called Aya, that it open-sourced earlier this year. The dataset comprises 513 million LLM prompts and answers in 114 languages. It was developed through an open-source initiative that drew contributions from about 3,000 contributors.

The project also saw Cohere release Aya-101, an LLM that understands 101 languages. According to the company, its new Aya-23-35B model significantly outperformed the former algorithm in a series of internal evaluations. It also proved more adept than several other open-source LLMs at multilingual text processing tasks.

Image: Cohere

 

  appreciate the content you create as well”