Can large language models perform clinical anamnesis? Comparative evaluation of ChatGPT, Claude, and Gemini in diagnostic reasoning through case-based questioning in oral and maxillofacial disorders - 22/11/25

Doi : 10.1016/j.jormas.2025.102644

Birkan Eyup Yilmaz ^a,^⁎ , Furkan Ozbey ^b, Busra Nur Gokkurt Yilmaz ^c, Hasan Akpinar ^d

^a Faculty of Dentistry, Department of Oral and Maxillofacial Surgery, Giresun University, Giresun, Türkiye

^b Faculty of Dentistry, Department of Dentomaxillofacial Radiology, Afyonkarahisar Health Sciences University, Afyonkarahisar, Türkiye

^c Giresun Oral and Dental Health Center, Department of Dentomaxillofacial Radiology, Giresun, Türkiye

^d Faculty of Dentistry, Department of Oral and Maxillofacial Surgery, Afyonkarahisar Health Sciences University, Afyonkarahisar, Türkiye

^⁎ Corresponding author at: Giresun University, Faculty of Dentistry, Department of Oral and Maxillofacial Surgery, Giresun, Türkiye. Giresun University Faculty of Dentistry, Department of Oral and Maxillofacial Surgery Giresun Türkiye

Abstract

Introduction

This study aimed to evaluate whether large language models (LLMs) can emulate the clinical anamnesis process and diagnostic reasoning of oral and maxillofacial surgeons.

Materials and methods

Twenty-five real clinical cases from five diagnostic categories maxillary sinus diseases, periapical pathologies, orofacial pain disorders and neuropathic pain syndromes, odontogenic cysts and tumors, and temporomandibular joint disorders were simulated. Three LLMs (ChatGPT 4o, Claude 4, and Gemini 2.5) were each provided only the patient’s chief complaint and instructed to ask up to ten sequential questions to reach a diagnosis. One independent evaluators scored model performances on a 100 point scale, deducting 10 points for each additional question asked. Statistical comparisons were conducted using Kruskal–Wallis and Bonferroni post-hoc tests.

Results

No statistically significant difference was found among the models ( p = 0.431). Gemini achieved the highest mean diagnostic score (43.6 ± 40.71), followed by ChatGPT-4o (37.2 ± 36.8) and Claude (31.6 ± 33.0). Diagnostic accuracy was highest in moderately difficult cases ( p = 0.021) and markedly decreased in difficult ones ( p = 0.016).

Conclusion

LLMs demonstrated the ability to perform structured anamnesis and reach clinically meaningful diagnostic conclusions using limited information. Although no significant difference was observed among the models, Gemini achieved the highest overall mean score. These findings indicate that LLMs hold potential as complementary tools for diagnostic reasoning and as simulation-based educational resources in oral and maxillofacial surgery.

Le texte complet de cet article est disponible en PDF.

Keywords : Large language models, ChatGPT, Claude, Gemini, Anamnesis, Diagnostic reasoning, Oral and maxillofacial surgery, Artificial intelligence

Plan

Introduction

Materials and methods

Ethical approval

Study design

Case selection and grouping

Reference standard determination

Interaction protocol and anamnesis simulation

Blinding and bias control

CRediT authorship contribution statement

Export

Vol 127 - N° 2

Article 102644- mars 2026 Retour au numéro

Article précédent

Effect of multidisciplinary ERAS-based nursing model on postoperative recovery in patients undergoing radical resection for oral cancer
Xin Lin, Yi Wang, Yue Deng, Yan Mao, Yu Pu, Ying Chen

| Article suivant

Delineating the regulatory axis of miR-20a, STAT3, and IL-6 in oral squamous cell carcinoma
Priya Thomas, Abilasha Ramasubramanian, Durairaj Sekar

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Can large language models perform clinical anamnesis? Comparative evaluation of ChatGPT, Claude, and Gemini in diagnostic reasoning through case-based questioning in oral and maxillofacial disorders - 22/11/25

Abstract

Introduction

Materials and methods

Results

Conclusion

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL