Can large language models perform clinical anamnesis? Comparative evaluation of ChatGPT, Claude, and Gemini in diagnostic reasoning through case-based questioning in oral and maxillofacial disorders - 22/11/25
, Furkan Ozbey b, Busra Nur Gokkurt Yilmaz c, Hasan Akpinar dAbstract |
Introduction |
This study aimed to evaluate whether large language models (LLMs) can emulate the clinical anamnesis process and diagnostic reasoning of oral and maxillofacial surgeons.
Materials and methods |
Twenty-five real clinical cases from five diagnostic categories maxillary sinus diseases, periapical pathologies, orofacial pain disorders and neuropathic pain syndromes, odontogenic cysts and tumors, and temporomandibular joint disorders were simulated. Three LLMs (ChatGPT 4o, Claude 4, and Gemini 2.5) were each provided only the patient’s chief complaint and instructed to ask up to ten sequential questions to reach a diagnosis. One independent evaluators scored model performances on a 100 point scale, deducting 10 points for each additional question asked. Statistical comparisons were conducted using Kruskal–Wallis and Bonferroni post-hoc tests.
Results |
No statistically significant difference was found among the models ( p = 0.431). Gemini achieved the highest mean diagnostic score (43.6 ± 40.71), followed by ChatGPT-4o (37.2 ± 36.8) and Claude (31.6 ± 33.0). Diagnostic accuracy was highest in moderately difficult cases ( p = 0.021) and markedly decreased in difficult ones ( p = 0.016).
Conclusion |
LLMs demonstrated the ability to perform structured anamnesis and reach clinically meaningful diagnostic conclusions using limited information. Although no significant difference was observed among the models, Gemini achieved the highest overall mean score. These findings indicate that LLMs hold potential as complementary tools for diagnostic reasoning and as simulation-based educational resources in oral and maxillofacial surgery.
Le texte complet de cet article est disponible en PDF.Keywords : Large language models, ChatGPT, Claude, Gemini, Anamnesis, Diagnostic reasoning, Oral and maxillofacial surgery, Artificial intelligence
Plan
Vol 127 - N° 2
Article 102644- mars 2026 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?
