Sepsis as Seen through the Eyes of AI: A Comparative evaluation of ChatGPT and Gemini - 24/12/25

Doi : 10.1016/j.idnow.2025.105228

Tayibe Bal ^a,^{^{1
ORCID ID: 0000-0002-5315-122x .},}^⁎ , Rahmet Güner ^b

^a Bolu Abant Izzet Baysal University, Faculty of Medicine, Department of Infectious Diseases and Clinical Microbiology, Bolu, Turkey

^b Ankara Yildirim Beyazit University, Ankara Bilkent City Hospital, Department of Infectious Diseases and Clinical Microbiology, Ankara, Turkey

^⁎ Corresponding author at: Department of Infectious Disease and Clinical Microbiology, Bolu Abant Izzet Baysal University, Faculty of Medicine, Bolu, Turkey. Department of Infectious Disease and Clinical Microbiology Bolu Abant Izzet Baysal University Faculty of Medicine Bolu Turkey

Highlights

•	Cross-sectional benchmarking of ChatGPT-4o vs Gemini 2.5 Flash on 82 sepsis questions (FAQ + SSC) was carried out.
•	Dual infectious-disease raters used the Global Quality Scale; reproducibility was tested by repeat queries.
•	Gemini delivered markedly higher quality (GQS-5 in 94% vs 35.4% for ChatGPT) and greater reproducibility (97.5% vs 76.5% overall).
•	The two models underperformed in the “prevention” domain, indicating a common weakness.
•	The findings support rigorous benchmarking and domain-specific optimization prior to public deployment.

Le texte complet de cet article est disponible en PDF.

Abstract

Introduction

More and more people are using large language models (LLMs) to seek out health information online. Although these tools have great potential to improve digital health literacy, not enough is known about their accuracy and consistency, especially in life-threatening conditions such as sepsis. The aim of this study was to test and compare the effectiveness of two popular LLMs, ChatGPT 4o and Gemini 2.5 Flash, in providing accurate and consistent answers to questions about sepsis.

Material and Methods

A cross-sectional benchmarking study was conducted using a standardized set of sepsis-related questions, comprising two main categories: frequently asked questions (FAQs) and items drawn from the Surviving Sepsis Campaign (SSC) guidelines. The responses generated by the two models were independently assessed by two raters using the Global Quality Score (GQS), and reproducibility was evaluated by submitting each question twice.

Results

Gemini significantly outperformed ChatGPT in overall quality and reproducibility. More specifically, 94% of Gemini’s responses received the highest GQS rating (GQS 5), compared to only 35.4% of the ChatGPT answers. Gemini also demonstrated higher reproducibility (97.5% vs. 76.5%). Both models underperformed in the “prevention” domain. Gemini showed greater potential than ChatGPT in delivering accurate and consistent sepsis-related health information, which is crucial for patients and caregivers alike.

Conclusion

These findings underscore the importance of rigorous benchmarking before integrating LLMs into digital health platforms, and illustrate a need for refinement of LLMs to enhance their reliability in public-facing health communication.

Le texte complet de cet article est disponible en PDF.

Keywords : Artificial intelligence, Sepsis, ChatGPT, Gemini, Large language model

Plan

Export

Vol 56 - N° 1

Article 105228- janvier 2026 Retour au numéro

Article précédent

Digital severity scoring and viral metagenomics: A feasibility study on integrated diagnosis of pediatric influenza-like illness
Patrick E. Obermeier, Maren Alchikh, Xiaolin Ma, Janine Reiche, Brunhilde Schweiger, Barbara A. Rath

| Article suivant

Diagnostic performances of Anyplex MTB/NTM Real-time detection vs Xpert® MTB/RIF Ultra for tuberculosis among people living with human immunodeficiency virus with low CD4 count
Alina Gaxiola-Castro, Samantha Flores-Treviño, Paola Bocanegra-Ibarias, Mariana Ramírez-Yáñez, Adrián Camacho-Ortiz

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Sepsis as Seen through the Eyes of AI: A Comparative evaluation of ChatGPT and Gemini - 24/12/25

Highlights

Abstract

Introduction

Material and Methods

Results

Conclusion

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL