By learning to anticipate the next word from its context, Deep Language Models become better and better.


Deep learning has made significant strides in text generation, translation, and completion in recent years. Algorithms trained to predict words from their surrounding context have been instrumental in achieving these advancements. However, despite access to vast amounts of training data, deep language models still need help to perform tasks like long story generation, summarization, coherent dialogue, and information retrieval. These models have been shown to need help capturing syntax and semantic properties, and their linguistic understanding needs to be more superficial. Predictive coding theory suggests that the brain of a human makes predictions over multiple timescales and levels of representation across the cortical hierarchy. Although studies have previously shown evidence of speech predictions in the brain, the nature of predicted representations and their temporal scope remain largely unknown. Recently, researchers analyzed the brain signals of 304 individuals listening to short stories and found that enhancing deep language models with long-range and multi-level predictions improved brain mapping.

The results of this study revealed a hierarchical organization of language predictions in the cortex. These findings align with predictive coding theory, which suggests that the brain makes predictions over multiple levels and timescales of expression. Researchers can bridge the gap between human language processing and deep learning algorithms by incorporating these ideas into deep language models.

The current study evaluated specific hypotheses of predictive coding theory by examining whether cortical hierarchy predicts several levels of representations, spanning multiple timescales, beyond the neighborhood and word-level predictions usually learned in deep language algorithms. Modern deep language models and the brain activity of 304 people listening to spoken tales were compared. It was discovered that the activations of deep language algorithms supplemented with long-range and high-level predictions best describe brain activity.

The study made three main contributions. Initially, it was discovered that the supramarginal gyrus and the lateral, dorsolateral, and inferior frontal cortices had the largest prediction distances and actively anticipated future language representations. The superior temporal sulcus and gyrus are best modeled by low-level predictions, while high-level predictions best model the middle temporal, parietal, and frontal regions. Second, the depth of predictive representations varies along a similar anatomical architecture. Eventually, it was demonstrated that semantic traits, rather than syntactic ones, are what influence long-term forecasts.

According to the data, the lateral, dorsolateral, inferior, and supramarginal gyri were shown to have the longest predicted distances. These cortical areas are linked to high-level executive activities like abstract thought, long-term planning, attentional regulation, and high-level semantics. According to the research, these regions, which are at the top of the language hierarchy, may actively anticipate future language representations in addition to passively processing past stimuli.

The study also demonstrated variations in the depth of predictive representations along the same anatomical organization. The superior temporal sulcus and gyrus are best modeled by low-level predictions, while high-level predictions best model the middle temporal, parietal, and frontal regions. The results are consistent with the hypothesis. In contrast to present-day language algorithms, the brain predicts representations at several levels rather than only those at the word level.

Eventually, the researchers separated the brain activations into syntactic and semantic representations, discovering that semantic features?rather than syntactic ones?influence long-term forecasts. This finding supports the hypothesis that the heart of long-form language processing may involve high-level semantic prediction.

The study’s overall conclusion is that benchmarks for natural language processing might be improved, and models could become more like the brain by consistently training algorithms to predict numerous timelines and levels of representation.