How Transformer LLMs Extract Knowledge From Their Parameters


In recent years, transformer-based large language models (LLMs) have become very popular because of their ability to capture and store factual knowledge. However, how these models extract factual associations during inference remains relatively underexplored. A recent study by researchers from Google DeepMind, Tel Aviv University, and Google Research aimed to examine the internal mechanisms by which transformer-based LLMs store and extract factual associations.

The study proposed an information flow approach to investigate how the model predicts the correct attribute and how internal representations evolve across layers to generate outputs. Specifically, the researchers focused on decoder-only LLMs and identified critical computational points related to the relation and subject positions. They achieved this by using a “knock out” strategy to block the last position from attending to other positions at specific layers, then observing the impacts during inference.

To further pinpoint locations where attribute extraction occurs, the researchers analyzed the information propagating at these critical points and the preceding representation construction process. They achieved this through additional interventions to the vocabulary and the model’s multi-head self-attention (MHSA) and multi-layer perceptron (MLP) sublayers and projections.

The researchers identified an internal mechanism for attribute extraction based on a subject enrichment process and an attribute extraction operation. Specifically, information about the subject is enriched in the last subject token across early layers of the model, while the relation is passed to the last token. Finally, the last token uses the relation to extract the corresponding attributes from the subject representation via attention head parameters.

The findings offer insights into how factual associations are stored and extracted internally in LLMs. The researchers believe these findings could open new research directions for knowledge localization and model editing. For example, the study’s approach could be used to identify the internal mechanisms by which LLMs acquire and store biased information and to develop methods for mitigating such biases.

Overall, this study highlights the importance of examining the internal mechanisms by which transformer-based LLMs store and extract factual associations. By understanding these mechanisms, researchers can develop more effective methods for improving model performance and reducing biases. Additionally, the study’s approach could be applied to other areas of natural language processing, such as sentiment analysis and language translation, to understand better how these models operate internally.