Researchers from MIT and Harvard Introduce Language Models Trained on Media Diets that can Predict Public Opinion


Traditional survey-based approaches for measuring public opinion have limitations, but public opinion reflects and influences society’s behavior. Questions about the extent to which AI can understand and adopt human-language-based attitudes need to be explored. Answering these problems has become increasingly pressing as huge language models develop and become more commonly utilized, thanks to recent work like GPT3, PaLM, ChatGPT, Claude, and Bard. 

A recent work by MIT and Harvard University follows in the footsteps of other recent advances in natural language processing software that summarize large datasets to aid human decision-making. They present a new method for investigating media diet models, which are modified language models that mimic the perspectives of subpopulations based on their consumption of certain media (such as internet news, TV broadcasts, or radio shows).

Predictive power, robustness to question framing, effectiveness across media types, and the presence of predictive signals after accounting for demographics are all demonstrated for media diet models in public health and economic contexts. Additional analyses show how they are sensitive to the level of attention, individuals give to the news and how their impacts vary depending on the type of inquiry asked.

To anticipate how a subpopulation will answer a survey question, the team employs a computational model that inputs a description of the subpopulation’s media diet and the question being asked. In silico public opinion models can be used if they can accurately forecast the results of human surveys. Questions of public sentiment (such as “How do people feel about the pandemic”) and scientific inquiry into media effects (such as “How does media diet affect perceptions of the pandemic”) could be aided by such an approach.

There are three stages to developing a model for a media diet: 

  1. A language model is developed or used to predict omitted words in a document. In this work, they mostly employ BERT, a pretrained model. 
  2. Modifying the language model by training it on a media diet dataset includes content from various media outlets covering a certain time frame. The researchers use TV and radio to show transcripts and internet news. This modification lets the model take in fresh data while simultaneously refreshing its internal knowledge representations. 
  3. Asking these models questions to see if their response distributions reflect those of populations with different dietary patterns based on the media they consume. They analyze responses to survey questions by querying the media diet model. 

The researchers employ regression models in which (i) is used to predict (ii) to undertake public opinion forecasting. The polling information comes from statewide surveys regarding COVID-19 and consumer confidence. Finally, they employ the closest neighbor method to track the source media diet datasets from which the forecasts for a specific survey question were derived.

The importance of media diet research is bolstered by three interconnected issues: 

  1. Selective exposure, or the broad systemic bias in which people gravitate towards information that is coherent with their prior ideas
  2. Echo chambers, where beliefs shared among like-minded individuals are amplified and strengthened by the environment chosen
  3. Filter bubbles, where content curation and recommendation algorithms surface items based on users’ past activities, again reinforcing the users’ worldviews.

Models of the media diet could be used to determine which groups are receiving the most potentially hazardous messages. They also provide a way for research into the more nuanced effects of communications, such as the variation in resonance caused by variations in word choice. While this has been investigated in controlled lab settings and, to a lesser extent, online, researchers focusing on media effects have been hampered by a lack of appropriate tools.

The team that these models will eventually be used to help solve real-world problems with a focus on people.


Check out the Paper. Don’t forget to join our 19k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

???? Check Out 100’s AI Tools in AI Tools Club


Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.