China’s Open Source Large Language Model with Baichuan-13B

Wang Xiaochuan, the founder of the Chinese search engine Sogou, has released a new huge language model called Baichuan-13B through his business, Baichuan Intelligence. Commercial use by programmers and researchers is currently restricted. The founder of Sogou, Wang Xiaochuan, recently posted on Weibo that “China needs its own OpenAI.” The Chinese businessman is one step closer to realizing his vision after his fledgling company, Baichuan Intelligence, released Baichuan-13B, its next-generation large language model. Baichuan launched three months ago and rapidly attracted a group of investors willing to put up $50 million. As a result of the founder’s exceptional skills in computer science, his organization is now regarded as one of China’s most promising creators of huge language models.

The Baichuan-13B follows the same Transformer design as the GPT and most homegrown Chinese variants. In addition to being trained on data in both Chinese and English, its 13 billion parameters (variables used in text production and analysis) are bilingual. The model is open source and can be used for profit, and it was built using data from GitHub.

After the success of Baichuan-7B, Baichuan Intelligent Technology created Baichuan-13B, a commercially available open-source large-scale language model with 13 billion parameters. On respected Chinese and English norms, it outperforms competitors of a similar size. Both the baseline (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat) versions are included in this rollout.

Features

Baichuan-13B builds on Baichuan-7B by increasing the number of parameters to 13 billion, and it has trained 1.4 trillion tokens on high-quality corpora, which is 40% more than LLaMA-13B. Currently, under the open source 13B size, it is the model with the most training data. It employs ALiBi positional encoding and a 4096-byte context window and works in Chinese and English.
The pre-training model serves as a “base” for developers, while the aligned model with dialogue features is more in demand among regular users. Therefore, the aligned model (Baichuan-13B-Chat) is included in this open-source version, boasting powerful dialogue features, being ready-to-use, and requiring only a few lines of code to deploy.
Researchers are also making int8 and int4 quantized versions available, which are even more efficient for inference, to encourage widespread user use. They can be implemented on consumer-grade graphics cards like the Nvidia 3090, but the non-quantized version requires significantly more powerful hardware.
Free for public use without restrictions on resale or modification: If a developer applies for an official commercial license through email, they can utilize Baichuan-13B for commercial purposes at no cost.

About 1.4 billion tokens are being used to teach Baichuan-13. ChatGPT-3, according to OpenAI, was supposedly trained on 300 billion tokens. The Baichuan team doubled in size in three months, reaching fifty members, and publicly demonstrated its model, Baichuan-7B, which has seven billion parameters, last month. The Baichuan-13B version, issued two days ago, is the bare-bones release. It is now offered at no cost to researchers and programmers who have been granted legal authorization to put it to commercial use. The future of the model’s official release for widespread use has yet to be discovered.

The basic model Baichuan-13B is now freely available to researchers and programmers who have obtained the necessary legal clearances to put it to commercial use. In light of recent U.S. restrictions against Chinese manufacturers of artificial intelligence (AI) chips, the fact that variants of this model may be run on consumer hardware like Nvidia’s 3090 graphics cards is particularly noteworthy.

Baichuan Intelligent Technology researchers confirm that their group has yet to create any Baichuan-13B-based apps for any platform, including iOS, Android, the web, or others. Users are urged not to utilize the Baichuan-13B model for illegal or harmful purposes, such as compromising national or social security. Users are also encouraged to refrain from employing the Baichuan-13B model for Internet services without the necessary security audits and filings. They count on everyone following this rule to keep technological progress within the bounds of the law.

n n