Meet ChatGLM: An Open-Source NLP Model Trained on 1T Tokens and Capable of Understanding English/Chinese

ChatGLM (alpha internal test version: QAGLM) is a chat robot designed specifically for Chinese users. It uses a 100 billion Chinese-English language model with question-and-answer and conversation features. It’s been fine-tuned, the invitation-only internal test is live, and its scope will grow over time. In addition, researchers have released the newest Chinese-English bilingual discussion GLM model, ChatGLM-6B, which, when paired with model quantization technology, can be deployed locally on consumer-grade graphics cards (INT4). This follows the open-source GLM-130B 100 billion pedestal model. At the quantization level, just 6 GB of video RAM is needed. The ChatGLM-6B, with 6.2 billion parameters, is smaller than the 100 billion models, but it greatly reduces the threshold for user deployment. After about 1T identifiers of Chinese and English bilingual training, it has generated answers that align with human preferences, supplemented by supervision and fine-tuning, feedback self-help, human feedback reinforcement learning, and other technologies.

ChatGLM

ChatGLM takes the concept of ChatGPT as its starting point, injects code pre-training into the 100 billion base model GLM-130B 1, and achieves human intention alignment using Supervised Fine-Tuning and other methods. The exclusive 100 billion base model GLM-130B is largely responsible for increased capabilities in the current version of ChatGLM. This model is an autoregressive pre-training architecture with numerous goal functions, unlike BERT, GPT-3, or T5. Researchers released the 130 billion-parameter, Chinese-English dense model GLM-130B 1 to the academic and business communities in August 2022.

ChatGLM advantages and key features

It processes text in various languages and has natural language comprehension and generation capabilities.
It has been taught a great deal and is very knowledgeable in many areas so that it can provide people with accurate and helpful information and answers.
It can infer the relevant relationships and logic between texts in response to user queries.
It can learn from its users and environments and automatically update and enhance my models and algorithms.
Several sectors benefit from this technology, including instruction, healthcare, and banking.
Assist individuals in finding answers and resolving issues more quickly and easily.
Raise awareness and push for progress in the field of artificial intelligence.

Challenges and Limitations

It was conceived as a model of a machine devoid of feelings and awareness, and hence it lacks the capacity for empathy and moral reasoning shared by humans.
It is simple to be misleading or draw incorrect conclusions since knowledge depends on data and algorithms.
Uncertainty in responding to abstract or difficult issues; may need help to answer these kinds of inquiries accurately.

ChatGLM-130B

The Big Model Center at Stanford University evaluated 30 of the most popular large models from across the globe in November 2022, with GLM-130B being the only model from Asia to cut. In terms of accuracy and maliciousness indicators, robustness, and Calibration error, GLM-130B is close to or equal to GPT-3 175B (davinci) for all pedestal large models at the scale of 100 billion, according to the evaluation report. This is in comparison to the major models of OpenAI, Google Brain, Microsoft, Nvidia, and Facebook.

ChatGLM-6B

ChatGLM-6B is a 6.2 billion-parameter Chinese-English language model. ChatGLM-6B is a Chinese question-and-answer and discussion system that uses the same technology as ChatGLM (chatglm.cn) to run on a single 2080Ti and enable reasoning. Researchers open source the ChatGLM-6B model simultaneously further to facilitate the community’s development of big model technologies.

The ChatGLM-6B model is a 6.2 billion-parameter, open-source, multilingual version of the Generic Language Model (GLM) framework. The quantization method allows customers to deploy locally on low-end graphics hardware.

Using a method very similar to ChatGPT, ChatGLM-6B is designed to facilitate question-and-answer sessions in Mandarin. Researchers use supervised fine-tuning, feedback bootstrap, and reinforcement learning with human input to train the model on a combined 1T tokens of Chinese and English corpus. The model can respond consistently to human choice with roughly 6.2 billion parameters.

Features that set ChatGLM-6B apart

ChatGLM-6B’s 1T tokens are multilingual, trained on a mixture of Chinese and English content at a 1:1 ratio.
The two-dimensional RoPE position encoding technique has been improved using the conventional FFN structure based on the GLM-130B training experience. ChatGLM-6B’s manageable parameter size of 6B (6.2 billion) also allows for independent tuning and deployment by academics and individual developers.
At least 13 GB of video RAM is needed for ChatGLM-6B to reason with FP16 half-precision. This demand may be further decreased to 10GB (INT8) and 6GB (INT4) when combined with model quantization technology, allowing ChatGLM-6B to be deployed on consumer-grade graphics cards.
ChatGLM-6B has a sequence length of 2048, making it suitable for lengthier chats and applications than GLM-10B (sequence length: 1024).
The model is trained to interpret human teaching intents using Supervised Fine-Tuning, Feedback Bootstrap, and Reinforcement Learning from Human Feedback. The shown markdown format is the result.

ChatGLM-6B Limitations

6B’s limited storage space is to blame for its little model memory and language skills. ChatGLM-6B may give you bad advice when you ask her to do anything requiring much factual knowledge or solve a logical difficulty (such as mathematics or programming).
Being a language model that is only loosely attuned to human intent, to begin with, ChatGLM-6B has the potential to produce biased and perhaps destructive output.
There needs to be more sufficiency in ChatGLM-6B’s capacity to interpret context. It’s possible for the conversation to lose its context and for mistakes to be made in comprehension if it takes too long to generate answers or if several rounds of talk are required.
Most training materials are written in Chinese, while just a fraction is written in English. Hence the quality of the response may suffer when English instructions are used, and it may even be at odds with the response provided when Chinese instructions are used.
Deceiving: ChatGLM-6B may have an issue with “self-perception,” making it vulnerable to being led astray and giving incorrect information. If the present version of the model is flawed, for instance, it will have a skewed sense of self. While the model has been subjected to fine-tuning instructions, multilingual pre-training of about 1 trillion identifiers (tokens), and reinforcement learning with human feedback (RLHF), it may still cause damage under specific instructions due to its limited capabilities—deceptive stuff.

Check out the Github Link and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.