HMN 2025: What is the New model frames human reinforcement learning in the context of memory and habits

choice

Humans and most other animals are known to be strongly driven by expected rewards or adverse consequences. The process of acquiring new skills or adjusting behaviors in response to positive outcomes is known as reinforcement learning (RL).

RL has been widely studied over the past decades and has even been adapted to train some computational models, such as some deep learning algorithms. Existing models of RL suggest that this type of learning is linked to dopaminergic pathways (i.e., neural pathways that respond to differences between expected and experienced outcomes).

Anne G. E. Collins, a researcher at University of California, Berkeley, recently developed a new model of RL specific to situations in which people’s choices have uncertain context-dependent outcomes, and they try to learn the actions that will lead to rewards. Her paper, published in Nature Human Behaviour, challenges the assumption that existing RL algorithms faithfully mirror psychological and neural mechanisms.

“A goal of my research is to understand how we use past information for learning,” Collins told Medical Xpress. “Our brain has multiple mechanisms for learning, and they operate in parallel, even when we learn very simple things (like which key press will give points when I see a specific image). It’s often difficult to identify their contributions well, and sometimes we mistake one for the other.

“With this study, I wanted to show the specific contributions of two mechanisms, working memory (WM) and habits (H), and to show why they could be mistaken for a more popular one, RL.”

New model frames human reinforcement learning in the context of memory and habits
RLWM experimental paradigm introduced by Collins. Participants performed multiple independent blocks of an RL task, using deterministic binary feedback to identify which of three actions was correct (Cor.) for each of ns stimuli. Varying ns targets WM load and allows researchers to isolate its contribution. Credit: Nature Human Behaviour (2025). DOI: 10.1038/s41562-025-02340-0

Separating mechanisms underpinning reward-based decisions

As part of her study, Collins re-analyzed seven previously published datasets collected as human participants completed reward-based decision-making tasks. During these tasks, participants played a simple computer game that required them to learn what keys they needed to press to gain points as they were viewing different images.

“We know that working memory plays a big role in this task, because participants learn much faster when there are only two images than when there are five or six (where it’s much more difficult to explicitly remember the correct key),” explained Collins. “To compare RL and H, we can focus more on when working memory doesn’t do the full job, with five or six items to learn.”

In the task considered by Collins, participants could also rely heavily on their WM and habit-like behavioral patterns. Specifically, they might try to remember what choices led to positive outcomes or simply keep making similar decisions.

“When you make a choice and get a disappointing outcome, like losing points, RL and WM both say that you should avoid this choice next time,” said Collins. “By contrast, a habit-like process is less focused on outcomes and simply tends to repeat previous choices—good or bad.”

Collins subsequently analyzed the errors made by participants and tried to shed light on what contributed to these errors. She found that people were strongly guided by habit and tended to repeat previous errors, instead of learning from them.

“Computational modeling confirmed that people’s behavior was more consistent with habits supporting working memory, than with RL supporting WM,” she said.

Re-framing RL and inspiring further investigations

The results of the analyses performed by Collins suggest that in humans, reward-based learning is not best explained by standard RL models. Instead, it suggests that this learning can be accounted for by working memory processes and habit-like repetitive behaviors.

“WM and H both have strong limitations: One can only learn very few items for a short time, one can only learn to repeat things (not what is good),” said Collins. “However, when you put them together, they are more than their sum: Because WM stirs us towards good actions often enough, it teaches H to repeat the good actions, which lets it learn a good policy. This is an emergent property of two limited systems combining into a more powerful one.”

This recent study suggests that it is easy to mistake other processes for RL. While it does not rule out the possibility that RL plays a part in how humans learn to make choices linked to maximum rewards, it shows that in some contexts WM and habits play an equally important role.

“It is striking that in the situations considered in my paper, RL is not needed to explain learning, despite it being the dominant modeling framework for this type of learning,” added Collins. “I now want to understand in which circumstances RL does emerge, and what drives our brain to engage WM, RL or H (or other processes) in the context of learning. This will involve doing more precise experiments, and more modeling.”

In the future, the findings gathered by Collins could guide the development of new computational models trained via RL. In addition, they could inspire other research teams to further explore the psychological and neural processes involved in reward-based learning.

“I also want to test whether the H process we find in this task indeed corresponds to what we think of as habits in real life,” added Collins. “I can test different individuals in the task who are differently susceptible to habitual behaviors in real life and see if this is reflected in their behavior in this task.”

Written for you by our author Ingrid Fadelli, edited by Lisa Lock, —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.

More information

Anne G. E. Collins, A habit and working memory model as an alternative account of human reward-based learning, Nature Human Behaviour (2025). DOI: 10.1038/s41562-025-02340-0

Journal information:
Nature Human Behaviour


Key medical concepts

Memory, Short-Term


The content is provided for information purposes only.