This AI Paper Shows How ChatGPT’s Toxicity Can Increase Up To Six-Fold When Assigned A Persona


With recent technological advancements, large language models (LLMs) like GPT-3 and PaLM have exhibited remarkable generation capabilities across a wide range of domains such as education, content creation, healthcare, research, etc. For instance, these large language models are especially useful to writers to help them enhance their writing style and to budding developers in assisting them to generate boilerplate code, etc. Moreover, combined with the availability of several third-party APIs, the widespread adoption of LLMs has only increased across several consumer-facing systems, such as by students and healthcare systems used by hospitals. However, in such scenarios, the safety of these systems becomes a fundamental issue as people trust these systems with sensitive personal information. This calls for a need to get a more clear picture of the different capabilities and limitations of LLMs. 

However, most previous research has focused on making LLMs more powerful by employing more advanced and sophisticated architectures. Although this research has significantly transcended the NLP community, it has also resulted in sidelining the safety of these systems. On this front, a team of postdoctoral students from Princeton University and Georgia Tech collaborated with researchers from the Allen Institute for AI (A2I) to bridge this gap by performing a  toxicity analysis of OpenAI’s revolutionary AI chatbot, ChatGPT. The researchers evaluated toxicity in over half a million generations of ChatGPT, and their investigations revealed that when the system parameter of ChatGPT was set such that it was assigned a persona, its toxicity increased multifold for a wide range of topics. For example, when ChatGPT’s persona is set to that of the boxer “Muhammad Ali,” its toxicity increases almost 3-fold compared to its default settings. This is particularly alarming as ChatGPT is currently being used as a foundation to build several other technologies which can then generate the same level of toxicity with such system-level modifications. Thus, the work done by A2I researchers and university students focuses on gaining a deeper insight into this toxicity in ChatGPT’s generations when it is assigned different personas.

The ChatGPT API provides a feature that allows the user to assign a persona by setting its system parameter such that the persona sets the tone for the rest of the conversation by influencing the way ChatGPT converses. For their use case, the researchers curated a list of 90 personas from different backgrounds and countries, like entrepreneurs, politicians, journalists, etc. These personas were assigned to ChatGPT to analyze its responses over approximately 128 critical entities such as gender, religion, profession, etc. The team also asked ChatGPT to continue certain incomplete phrases on these entities to gather more insights. The final findings showed that assigning ChatGPT a persona can enhance its toxicity by up to six times, with ChatGPT frequently producing harsh outputs and indulging in negative stereotypes and beliefs. 

The team’s research showed that the toxicity of the outputs varied significantly depending on the persona that ChatGPT was given, which the researchers theorize is because of ChatGPT’s comprehension of the person based on its training data. One finding, for instance, suggested that journalists are twice as toxic as businesspeople, even if this may not necessarily be the case in practice. The study also showed that specific populations and entities are targeted more frequently (nearly three times more) than others, demonstrating the model’s inherently discriminating behavior. For instance, toxicity varies depending on a person’s gender and is roughly 50% higher than toxicity based on race. These fluctuation tendencies could be damaging to users and derogatory to the individual in question. Moreover, malicious users can build technologies on ChatGPT to generate content that might harm an unsuspecting audience.

This study’s analysis of ChatGPT’s toxicity mainly revealed three things: the model can be significantly more toxic when personas are assigned (up to six times more toxic than default), the toxicity of the model varies greatly depending on the persona’s identity, with ChatGPT’s opinion about the persona playing a significant role; and ChatGPT can discriminatorily target specific entities by being more toxic while creating content about them. The researchers also noted that, even though ChatGPT was the LLM they utilized for their experiment, their methodology could be extended to any other LLM. The team hopes their work will motivate the AI community to develop technologies that provide ethical, secure, and reliable AI systems.


Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

???? Check Out 100’s AI Tools in AI Tools Club

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.