Information recorded over time in medical records tells more about diseases

Electronic health records (EHRs) contain important information about patients’ health outlook and the care they receive, but the records are not always precise. A new study describes an approach that uses machine learning, a type of artificial intelligence, to carefully track patients’ medical records over time in EHRs to predict their likelihood of having or developing different diseases. The study was led by researchers at Massachusetts General Hospital (MGH) and is published in Cell Patterns.

“Over the past decade, billions of dollars have been spent to institute meaningful use of EHR systems. For a multitude of reasons, however, EHR data are still complex and have ample quality issues, which make it difficult to leverage these data to address pressing health issues, especially during pandemics such as COVID-19, when rapid responses are needed,” said lead author Hossein Estiri, PhD, of the MGH Laboratory of Computer Science. “In this paper, we propose an algorithm for exploiting the temporal information in the EHRs that is distorted by layers of administrative and healthcare system processes.”

The strategy connects information from EHRs on patients’ medications and diagnoses over time, rather than from independent health records. Analyses revealed that this sequential approach can accurately compute the likelihood that a patient may actually have an underlying disease.

“Our study doesn’t rely on single diagnostic codes but instead relies on sequences of codes with the expectation that a sequence of relevant characteristics over time is more likely to represent reality than a single element,” Dr. Estiri said. “Additionally, the computer sorts through thousands of patients and can find sequences that a physician would likely never identify on their own as relevant, but actually are associated with the disease.”

As an example, coronary artery disease followed by chest pain in the medical record was more useful for predicting the development of heart failure than either of the factors on their own or in a different order.

The method can therefore identify disease markers that are interpretable by clinicians. This could lead to new computational models for identifying and validating new disease markers and for advancing medical discoveries. The proposed way of thinking about medical records could also help identify patients in a community who are at risk of developing a variety of other diseases and recommend their evaluation by healthcare providers.