
A perspective in Frontiers in Artificial Intelligence titled “Evidence-based AI: from trailblazer to trustblazer?” introduces a formal discipline called Evidence-based AI that applies the rigorous standards of medicine and toxicology to agentic software systems. The paper was led by Insilica founder and CEO Dr. Thomas Luechtefeld in collaboration with Dr. Thomas Hartung of the Johns Hopkins Bloomberg School of Public Health.
The core of this framework is the Evidence-based Agent Stack. This architecture provides the philosophical and structural foundation for Insilica’s ToxIndex platform. While traditional generative AI focuses on writing fluently, the Evidence-based AI approach requires systems to show their work through machine-actionable provenance and version-pinned data.
“Fluent text is not the same as defensible evidence,” said Dr. Luechtefeld. “Trailblazing AI optimizes for performance and speed; trustblazing AI optimizes for traceability, reproducibility, and accountability. We built ToxIndex to be the second kind. Every claim is bound to a source span, every run is version-pinned, and every step is auditable.”
The paper describes a nine-agent architecture that is already operational in the production version of ToxIndex. A protocol agent locks the research question and eligibility criteria before any screening begins. The retrieval agent then queries over 2,000 databases and 90 million regulatory documents to return rows and passages with stable identifiers. Subsequent agents handle screening and extraction while strictly marking missing data rather than inferring values.
ToxIndex also incorporates specialized agents for risk of bias and causal modeling. These tools use established frameworks such as RoB 2 and directed acyclic graphs to ensure that mechanistic reasoning remains transparent. A dedicated uncertainty agent produces a calibrated register for every conclusion while an evidence-to-decision agent translates certainty into final recommendations.
This methodology aligns with emerging regulatory principles such as TREAT and e-validation. These standards replace the traditional “validate-and-freeze” model with a lifecycle approach focused on continuous monitoring and drift detection. By embedding these controls directly into the software pipeline, Insilica ensures that narrative fluency is converted into scientific reliability at every gate.
“For decades, evidence-based medicine and evidence-based toxicology have given us the playbook of protocolized questions, reproducible retrieval, structured appraisal, graded certainty,” said Dr. Hartung. “What agentic AI adds is the ability to run that playbook as software, at scale, without diluting the standards. ToxIndex is the first concrete implementation of evidence-based AI in regulatory toxicology, and it is the most credible answer I have seen to the hallucination problem in high-stakes science.”
Publication details
Thomas Luechtefeld et al, Evidence-based AI: from trailblazer to trustblazer?, Frontiers in Artificial Intelligence (2026). DOI: 10.3389/frai.2026.1818128
Provided by
Insilica
The content is provided for information purposes only.
