HMN 2026: How AI models mirror human ‘us vs. them’ social biases

AI models exhibit human-like social biases, study finds — ION (Ingroup-Outgroup Neutralization). The team’s strategy to mitigate the us vs. them bias in LLMs. Credit: *arXiv* (2025). DOI: 10.48550/arxiv.2512.13699

Large language models (LLMs), the computational models underpinning the functioning of ChatGPT, Gemini and other widely used artificial intelligence (AI) platforms, can rapidly source information and generate texts tailored for specific purposes. As these models are trained on large amounts of texts written by humans, they could exhibit some human-like biases, which are inclinations to prefer specific stimuli, ideas or groups that deviate from objectivity.

One of these biases, known as the “us vs. them” bias, is the tendency of people to prefer groups they belong to, viewing other groups less favorably. This effect is well-documented in humans, but it has so far remained largely unexplored in LLMs.

Researchers at University of Vermont’s Computational Story Lab and Computational Ethics Lab recently carried out a study investigating the possibility that LLMs “absorb” the “us vs. them” bias from the texts that they are trained on, exhibiting a similar tendency to prefer some groups over others. Their paper, posted to the arXiv preprint server, suggests that many widely used models tend to express a preference for groups that are referred to favorably in training texts, including GPT-4.1, DeepSeek-3.1, Gemma-2.0, Grok-3.0 and LLaMA-3.1.

“This study investigates ‘us versus them’ bias, as described by Social Identity Theory, in large language models (LLMs) under both default and persona-conditioned settings across multiple architectures,” Tabia Tanzin Prama, Julia Witte Zimmerman recently wrote in their paper. “Using sentiment dynamics, allotaxonometry, and embedding regression, we find consistent ingroup-positive and outgroup-negative associations across foundational LLMs.”

Detecting group biases of LLMs

As part of their study, the researchers assessed several recently developed LLMs, specifically looking at how different social groups were mentioned in their responses. The models they evaluated were GPT-4.1, DeepSeek-3.1, Gemma-2.0, Grok-3.0, and LLaMA-3.1.

Interestingly, the team found that all the models they tested exhibited an “us vs. them” bias. When the models were asked to take on a specific “persona,” such as that of a person with a more conservative political inclination or a more liberal one, their language changed significantly following patterns consistent with these political views.

“We find that adopting a persona systematically alters models’ evaluative and affiliative language patterns,” wrote Prama, Zimmerman and their colleagues. “For the exemplar personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity. Persona conditioning produces distinct clustering in embedding space and measurable semantic divergence, supporting the view that even abstract identity cues can shift models’ linguistic behavior.”

The researchers also tested what happened if the prompts fed to the models targeted specific groups of people. Notably, they found that these queries led to more hostile responses from the AI models, increasing the negative language used to describe out-groups by 1.19% to 21.76%.

“These findings suggest that LLMs learn not only factual associations about social groups but also internalize and reproduce distinct ways of being, including attitudes, worldviews, and cognitive styles that are activated when enacting personas,” wrote the authors. “We interpret these results as evidence of a multi-scale coupling between local context (e.g., the persona prompt), localizable representations (what the model ‘knows’), and global cognitive tendencies (how it ‘thinks’), which are at least reflected in the training data.”

Tackling the ‘us vs. them’ bias in AI

The results of this recent study highlight the tendency of AI models to pick up the biases and views expressed in the data used to train them. In their paper, Prama, Zimmerman and their colleagues introduced a strategy that could help to reduce this bias in LLMs, which they dubbed ION.

“We demonstrate ION, an ‘us versus them’ bias mitigation approach using fine-tuning and direct preference optimization (DPO), which reduces sentiment divergence by up to 69%, highlighting the potential for targeted mitigation strategies in future LLM development,” wrote Prama, Zimmerman and their colleagues.

In the future, the researchers could set out to uncover more biases that AI models acquire during training, while also potentially introducing other bias mitigating strategies. Their efforts could collectively contribute to the development of fairer and more objective LLMs.

Written for you by our author Ingrid Fadelli, edited by Lisa Lock, —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.

More information:
Tabia Tanzin Prama et al, Us-vs-Them bias in Large Language Models, arXiv (2025). DOI: 10.48550/arxiv.2512.13699

Journal information:
arXiv

Detecting group biases of LLMs

Tackling the ‘us vs. them’ bias in AI

Related posts: