HMN 2026: How AI advice matches physician recommendations in early-stage liver cancer, but falls short in late stage

LLM treatment advice agrees with physician recommendations in early-stage HCC, but falls short in late stage
A graphical overview of the study evaluating the clinical utility of large language models (LLMs) for hepatocellular carcinoma treatment. The study analyzed 13,614 patients to compare real-world physician decisions with recommendations from ChatGPT, Gemini, and Claude. The findings reveal that while LLM concordance is associated with improved survival in early-stage disease, it correlates with worse outcomes in advanced stages due to divergent clinical priorities. Credit: Keungmo Yang and Ji Won Han, The Catholic University of Korea (CC-BY 4.0, https://creativecommons.org/licenses/by/4.0/)

Large language models (LLM) can generate treatment recommendations for straightforward cases of hepatocellular carcinoma (HCC) that align with clinical guidelines but fall short in more complex cases, according to a new study by Ji Won Han from The Catholic University of Korea and colleagues published in the open-access journal PLOS Medicine.

Choosing the most appropriate treatment for patients with liver cancer is complicated. While international treatment guidelines provide recommendations, clinicians must tailor their treatment choice based on cancer stage and liver function as well as other factors such as comorbidities.

To assess whether LLMs can provide treatment recommendations for hepatocellular carcinoma (HCC) that reflect real-world clinical practice, researchers compared suggestions generated by three LLMs (ChatGPT, Gemini, and Claude) with actual treatments received by more than 13,000 newly diagnosed patients with HCC in South Korea.

They found that, in patients with early-stage HCC, higher agreement between LLM recommendations and actual treatments was associated with improved survival. The inverse was seen in patients with advanced-stage disease. Higher agreement between LLM treatment recommendations and actual practice was associated with worse survival. LLMs placed greater emphasis on tumor factors, such as tumor size and number of tumors, while physicians prioritized liver function.

Overall, the findings suggest that LLMs may help to support straightforward treatment decisions, particularly in early-stage disease, but are not presently suitable for guiding care decisions for more complex cases that require nuanced clinical judgment. Regardless of stage, LLM advice should be used with caution and considered as a supplement to clinical expertise.

The authors add, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”

Publication details

Yang K, et al. Evaluating the clinical utility of large language models for hepatocellular carcinoma treatment recommendations: A nationwide retrospective registry study. PLOS Medicine (2026). DOI: 10.1371/journal.pmed.1004855

Journal information:
PLoS Medicine


Key medical concepts

Carcinoma, HepatocellularLiver Neoplasms

Clinical categories

OncologyGastroenterology


The content is provided for information purposes only.