Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses - 05/03/25

Doi : 10.1016/j.ajem.2024.12.024

B. Arslan ^{^⁎} , C. Nuhoglu, M.O. Satici, E. Altinbilek
Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey

^⁎Corresponding author.

Abstract

Background

The number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.

Methods

This single-center, prospective observational study was conducted in an urban ED over one week. Adult patients were enrolled through random 24-h intervals. Exclusions included minors, trauma cases, and incomplete data. Triage nurses assessed patients while an emergency medicine (EM) physician documented clinical vignettes and assigned emergency severity index (ESI) levels. These vignettes were then introduced to ChatGPT and Copilot for comparison with the triage nurse's decision.

Results

The overall triage accuracy was 65.2 % for nurses, 66.5 % for ChatGPT, and 61.8 % for Copilot, with no significant difference (p = 0.000). Moderate agreement was observed between the EM physician and ChatGPT, triage nurses, and Copilot (Cohen's Kappa = 0.537, 0.477, and 0.472, respectively). In recognizing high-acuity patients, ChatGPT and Copilot outperformed triage nurses (87.8 % and 85.7 % versus 32.7 %, respectively). Compared to ChatGPT and Copilot, nurses significantly under-triaged patients (p < 0.05). The analysis of predictive performance for ChatGPT, Copilot, and triage nurses demonstrated varying discrimination abilities across ESI levels, all of which were statistically significant (p < 0.05). ChatGPT and Copilot exhibited consistent accuracy across age, gender, and admission time, whereas triage nurses were more likely to mistriage patients under 45 years old.

Conclusion

ChatGPT and Copilot outperform traditional nurse triage in identifying high-acuity patients, but real-time ED capacity data is crucial to prevent overcrowding and ensure high-quality of emergency care.

Le texte complet de cet article est disponible en PDF.

Keywords : ChatGPT, Copilot, Triage, Emergency medicine, Emergency severity index, Large language models, Generative artificial intelligence

Abbreviations : AI, AUC, ED, EM, ESI, GPT, LLMs, ML, NLP, NPV, PPV

Plan

Introduction

Materials and methods

Study setting and population

Data collection and procedure

LLMs based-generative AI tools

Data analysis

Results

Discussion

Implications for ED practices

Limitations

Conclusion

Animal and human rights statement

Funding

Author contributions

CRediT authorship contribution statement

Export

Vol 89

P. 174-181 - mars 2025 Retour au numéro

Article précédent

Tramadol as a fentanyl adulterant: Prevalence and management in a ToxIC Fentalog study prospective cohort
Frank Dicker, Emilie Lothet, Evan Schwarz, Kim Aldy, Jeffrey Brent, Paul Wax, Rachel Culbreth, Sharan Campleman, Alex Krotulski, Barry Logan, Alexandra Amaducci, Bryan Judge, Michael Levine, Diane Calello, Joshua Shulman, Adrienne Hughes, Robert G. Hendrickson, Christopher W. Meaden, Alex F. Manini, On behalf of the Toxicology Investigators Consortium Fentalog Study Group

| Article suivant

Do Emergency Department Observation Units Help Prevent Revisits for Patients with Renal Colic?
Philip Giarrusso, Christopher Raio, Anil Bhagavath, Chukwuma Kalu, Adam Schwartz, Lauren Klein

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses - 05/03/25

Abstract

Background

Methods

Results

Conclusion

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL