
A mother’s health during pregnancy, childbirth and the postpartum period is the foundation of lifelong well-being, directly influencing a child’s development and long-term outcomes, yet most electronic health record (EHR) systems lack a reliable, standardized method to link mothers with their children.
Researchers from Regenstrief Institute, the Indiana University School of Medicine and partner institutions have developed and validated the first of its kind large-scale, probabilistic maternal–child record linkage algorithm using routinely collected EHR data. The retrospective cohort study, conducted as part of the real-world evidence core of the Maternal and Pediatric Precision in Therapeutics (MPRINT) collaborative, demonstrated that machine learning can reliably identify maternal-child relationships across different health systems—a breakthrough for understanding how a mother’s health influences her child’s outcomes over time.
With the evidence collected in this study, researchers can more effectively pursue large-scale observational studies on maternal-child medication effects, congenital conditions and other health outcomes across expansive EHR populations.
The article, “Derivation and validation of an algorithm for maternal–child linkage in electronic health records,” is published in the Journal of the American Medical Informatics Association.
“The health of a mother—including medications she takes and illnesses she has—can affect a child immediately, or not until years later. Without reliable linkages, it’s been hard for researchers to follow these relationships over time,” said lead author Colin Rogerson, M.D., MPH, Regenstrief and IU School of Medicine Research Scientist.
By establishing accurate maternal–child linkages at scale, researchers can now examine how prenatal exposures influence childhood development and long-term outcomes, including congenital diseases, neurodevelopmental disorders such as autism and ADHD, chronic conditions like asthma and allergies, and rarer diseases that have historically been difficult to study.
While several prior studies have attempted to link mothers and children using state or national datasets with mixed success, this study is the first to accurately achieve large-scale maternal–child linkage by applying machine learning to universally collected EHR demographic data across an expansive, statewide health information exchange.
“No one before this has been able to do what we’ve done here,” said Dr. Rogerson. “Other researchers have tried this with administrative or state-level data, but it has been hard to generalize their results. Our approach uses standard information that every hospital collects, which means other states and health systems should be able to use the same algorithm and achieve similar results.”
Using demographic features such as name, birthdate, phone number and address, the research team applied an XGBoost machine learning model to more than 82 million records, evaluating 6.2 billion potential maternal-child pairs. The algorithm achieved 92% accuracy, 98% precision and an F1-score of 92%, indicating strong performance identifying true maternal-child connections at scale.
“Linking mothers and children in electronic health records has been a longstanding challenge,” said Regenstrief Vice President for Data and Analytics Shaun Grannis, M.D., M.S. “By leveraging high-quality real-world data and modern machine learning, this work demonstrates how we can responsibly apply AI to answer questions that matter for public health. The ability to generate reliable maternal-child linkages across systems opens the door to discoveries that weren’t possible before.”
Publication details
Colin M Rogerson et al, Derivation and validation of an algorithm for maternal–child linkage in electronic health records, Journal of the American Medical Informatics Association (2025). DOI: 10.1093/jamia/ocaf177
Journal information:
Journal of the American Medical Informatics Association
The content is provided for information purposes only.
