HMN 2025: What is the dental caries variability in national survey database

dentist

dA new article published in the Journal of Dental Research explores the development of an integrated data-cleaning and subtype discovery pipeline using unsupervised machine learning for comprehensive analysis and visualization of data patterns in the National Health and Nutrition Examination Survey (NHANES) database.

Authored by Alena Orlenko, Cedars-Sinai Medical Center, Los Angeles, the study “Uncovering Dental Caries Heterogeneity in NHANES Using Machine Learning” addresses the limitations of the NHANES, one of the largest curated repositories of nationally representative population-level health-related indicators, by establishing a data-cleaning pipeline with a novel outlier detection algorithm and unsupervised machine learning to identify phenotype subtypes within NHANES dental caries data.

“By bringing the power of machine learning to a large national data set, the authors identify key clusters of factors linked to caries in children or seniors,” said Nick Jakubovics, Editor-in-Chief of the Journal of Dental Research. “The next challenge is to build on this information and find more effective methods to prevent caries in different groups of people.”

The study demonstrates a robust data-cleaning–subtype discovery pipeline that could be applied to investigate other health conditions using NHANES and similar databases for machine learning predictive modeling. Applying a comprehensive bioinformatics pipeline to NHANES data successfully identified substantial age-driven heterogeneity in dental caries, suggesting stratification is crucial for future predictive modeling.

This integrative approach systematically addresses data quality issues and facilitates exploratory analysis to reveal data patterns associated with subtypes and variables associated with the clinical heterogeneity of caries. It uncovered novel associations between caries status, lead/pollutant exposure, specific laboratory markers and food types, as well as sleep patterns, reflecting additional disease markers in susceptible populations. This demonstrates the value of integrating data science techniques with large-scale observational data to gain deeper insights into complex, multifactorial diseases.

More information

A. Orlenko et al, Uncovering Dental Caries Heterogeneity in NHANES Using Machine Learning, Journal of Dental Research (2025). DOI: 10.1177/00220345251398027

Journal information:
Journal of Dental Research


Key medical concepts

Tooth Decay
Biomarkers

Provided by
International Association for Dental, Oral, and Craniofacial Research


The content is provided for information purposes only.