HMN 2026: How to Map cell development with mathematics-informed machine learning

Mapping cell development with mathematics-informed machine learning
Benchmarking scDiffEq performance using lineage-traced haematopoietic development data. Credit: Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01150-3

The development of humans and other animals unfolds gradually over time, with cells taking on specific roles and functions via a process called cell fate determination. The fate of individual cells, or in other words, what type of cells they will become, is influenced both by predictable biological signals and random physiological fluctuations.

Over the past decades, medical researchers and neuroscientists have been able to study these processes in greater depth, using a technique known as single-cell RNA sequencing (scRNA-seq). This is an experimental tool that can be used to measure the gene activity of individual cells.

To better understand how cells develop over time, researchers also rely on mathematical models. One of these models, dubbed the drift-diffusion equation, describes the evolution of systems as the combination of predictable changes (i.e., drift) and randomness (i.e., diffusion).

Researchers at Broad Institute of MIT and Harvard, Massachusetts General Hospital and Harvard Medical School developed a new machine learning framework based on this mathematical model that can be used to study cell development and the determination of their fate. This framework, called scDiffEq, was introduced in a paper recently published in Nature Machine Intelligence.

“This project came together thanks to Michael Vinyard, who was a shared Ph.D. student between my lab and the Getz lab,” Luca Pinello, co-author of the paper, told Phys.org. “He transitioned from an experimentally focused lab to our labs focused on computational biology, driven by his passion for exploring new computational modeling approaches.”

Mapping cell development with mathematics-informed machine learning
scDiffEq algorithm overview and applications. Credit: Vinyard et al., Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01150-3.

New advanced tools to study cell development

Pinello, Vinyard, Getz and their colleagues wanted to apply recent advances in the field of machine learning to the study of cell development. They specifically tried to shed light on how cells choose their “fate” using mathematical models called neural stochastic differential equations.

“Deep computational modeling of single cell trajectories could be used to predict the effect of perturbations, which is critical for finding drug targets and designing clinical interventions,” said Gad Getz, co-senior author of the paper.

“The team became stronger when we gained the support of Allon Klein, who developed the key lineage-traced dataset (LARRY) that we used to benchmark our method. After discussions with colleagues in the field, we converged on this modeling framework with crucial contributions from Anders Rasmussen and Ruitong Li.”

By analyzing high-quality lineage-traced data collected over the past few years, this large team of researchers refined their approach. Their collective efforts ultimately led to the development of the scDiffEq framework.

“I was inspired by the idea that the diversity of cell types in the human body arises programmatically from a single pluripotent cell,” explained Michael Vinyard, first author of the paper.

“These developmental processes are hijacked in disease; many cancers feature mutations that disrupt cell fate decision-making. For over a century, differential equations have been a mainstay for modeling dynamical systems. More recently, they’ve been applied to gene regulatory networks, the biological circuits that govern how cells make these decisions.”

In recent years, researchers collected increasingly precise and high-resolution measurements using scRNA-seq and other advanced experimental tools. These advanced techniques allowed them to profile over 20,000 genes per cell, yet the development of these cells proved impossible to delineate using models rooted in traditional differential equations.

Neural differential equations changed this, providing a framework for learning equations that describe cell state dynamics directly from data,” said Vinyard.

“GPU-accelerated numerical approximation lets us scale beyond simple, constrained systems to high-dimensional observations of cells. With learned equations in hand, you can do more than fit observed data: you can simulate cell behavior under perturbation, generating hypotheses about how genetic or pharmacological interventions might impact cell fate.”

Underpinnings and advantages of the scDiffEq model

The new machine learning-based framework developed by the researchers models how cells change over time using neural stochastic differential equations. The team tried to use this model to predict the fate of cells (i.e., what type of cells they will become).

“Think of cells as particles moving through a landscape: some forces push them in specific directions (the ‘drift,’ representing deterministic regulatory programs), while random fluctuations also influence their path (the ‘diffusion,’ representing biological noise),” explained Pinello. “Previous methods treated biological noise as uniform across all cells and cell states. scDiffEq instead learns that different cell states experience different levels of stochasticity, which may better reflect real biological systems.”

To assess their framework’s potential, the researchers initially used it to trace the development of blood cells, using real scRNA-seq data. Their model was found to predict the fates of individual cells with an accuracy of 58%, which is 8% greater than that achieved by previously introduced models.

“scDiffE learns equations describing how cells change state over time from single-cell data,” said Vinyard. “At its core, it’s built on neural stochastic differential equations, using neural networks to parameterize both the deterministic (drift) and stochastic (diffusion) dynamics driving cell state change.”

A novel characteristic of the team’s model is that it learns predictable (i.e., drift) and random (i.e., diffusion) factors influencing cell development as a function of a cell’s state. This is in stark contrast with previously introduced machine learning-based models, which either ignore diffusion entirely or treat it as a constant across all cells.

“scDiffEq learns how stochasticity itself varies across the cell landscape: a progenitor cell sitting at a fate decision may have a different stochasticity profile than that of a terminally differentiated cell,” said Vinyard.

“In practice, you give it cells with temporal information (real or approximated), and it learns a vector field over cell state space. Once trained, you can simulate trajectories from any starting population, predict cell fate outcomes, or introduce perturbations and study the resulting changes.”

New avenues for neuroscience and medical research

This recent study shows that the explicit modeling of cell state-dependent “randomness” can significantly improve the prediction of cell fates. In the future, other teams could draw inspiration from their work and set out to develop similar machine learning models guided by neural stochastic differential equations.

“Our results suggest that cells may have programmed modules that deliberately alter the ‘noisiness’ during development, and capturing this is essential for accurate modeling,” said Pinello.

“We also explored whether our model could predict the effect of perturbations at the gene level, since this can be an important tool for in silico assessment of potentially therapeutic interventions. scDiffEq can reconstruct high-resolution developmental trajectories and perform in silico genome-wide perturbation screens. It also generalizes to single time-point datasets, which represent the majority of available single-cell experiments.”

As part of their next studies, Vinyard, Pinello, Getz and their colleagues plan to further improve the scDiffEq model and address some of its limitations. For instance, currently the model operates in a lower-dimensional space, as explicitly modeling all genes would be highly challenging. In the future, they hope to introduce more scalable models that can directly and reliably map genes.

“We also currently use only gene expression levels,” added Pinello. “Eventually, we plan to integrate other molecular layers such as chromatin accessibility to build more comprehensive models.

“Finally, there is tremendous opportunity in integrating perturbation data from recent large-scale experiments that combine single-cell measurements with CRISPR perturbations, often referred to as Perturb-seq. Training on such data could dramatically improve our in-silico perturbation predictions.”

Written for you by our author Ingrid Fadelli, edited by Gaby Clark, —this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.

Publication details

Michael E. Vinyard et al, Learning cell dynamics with neural differential equations, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01150-3.


The content is provided for information purposes only.