Validation of a machine learning model for predicting early deterioration in the emergency department - 11/06/26

Doi : 10.1016/j.ajem.2026.05.007

Yerin R. Lee ^a, Ian Ruffolo ^b, Pouria Mashouri ^b, Michael Brudno ^b, Maxim Ben-Yakov ^a,^b,^c,^d,^{^⁎}

^a Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada

^b Department of Computer Science, University of Toronto, Toronto, ON, Canada

^c Department of Medicine, University of Toronto, Toronto, ON, Canada

^d Department of Emergency Medicine, University of Toronto, Toronto, ON, Canada

^⁎ Corresponding author at: Toronto General Hospital, 200 Elizabeth Street, Room 480, R. Fraser Elliott Building, Ground, Toronto, ON M5G 2C4, Canada. Toronto General Hospital 200 Elizabeth Street, Room 480, R. Fraser Elliott Building, Ground Toronto ON M5G 2C4 Canada

Abstract

Early recognition of patients at risk for deterioration in the emergency department (ED) is critical for patient safety. Traditional early warning scores rely on structured triage data and often perform poorly in the dynamic ED environment. We developed and evaluated two machine learning models integrating structured triage data with transformer-based embeddings of free-text nursing triage notes to predict early clinical deterioration prior to initial physician assessment, designed as a risk-based prioritization tool to rank patients by predicted probability of adverse outcome.

We analyzed 17,481 consecutive adult ED visits over six months. Structured variables (demographics, vital signs, eCTAS scores) were combined with BioClinicalBERT-derived embeddings from free-text nursing triage notes to form a multimodal feature representation. Two XGBoost models (A, B) were trained on the same binary classification task, predicting “early deterioration” (ICU admission or death within 7 days, prevalence 4.5%) versus all other outcomes, differing only in class weighting. Model A used standard class weighting; Model B applied increased weighting to the early deterioration class to prioritize identification of high-risk patients.

Model A achieved a recall of 0.66 (95% CI: 0.59–0.73), precision of 0.17 (95% CI: 0.15–0.20), and ROC-AUC of 0.75 (95% CI: 0.72–0.79). Model B improved recall to 0.77 (95% CI: 0.72–0.84), precision to 0.22 (95% CI: 0.19–0.25), and ROC-AUC to 0.90 (95% CI: 0.88–0.92). While XGBoost's internal feature importance attributed the majority of predictive weight to free-text embeddings, SHAP analysis identified age, respiratory rate, and systolic blood pressure as the dominant individual contributors, with triage note embeddings providing meaningful incremental value confirmed by structured-variable ablation.

These findings suggest that AI-driven risk prioritization may function as an adjunct layer of situational awareness in the ED, complementing clinical judgement rather than replacing it. Safe clinical adoption will require prospective shadow testing in real-time workflows to quantify ranking accuracy, assess operational feasibility, and evaluate impact on decision-making before any clinician-facing implementation.

Le texte complet de cet article est disponible en PDF.

Highlights

•	Multimodal AI predicts ED deterioration using triage notes and structured EHR data.
•	Weighted XGBoost model achieves 0.90 ROC-AUC and 0.77 recall for severe outcomes.
•	Clinical text embeddings provide an 8% increase in overall predictive accuracy.
•	Tool supports situational awareness by prioritizing occult high-risk ED patients.
•	SHAP analysis confirms age and vitals as dominant predictors of deterioration.

Le texte complet de cet article est disponible en PDF.

Keywords : Emergency department, Early deterioration, Machine learning, Predictive modeling, Clinical decision support, Risk stratification

Plan

Introduction

Methods

Data Source and Study Population

Data Structure and Preprocessing

Model Development and Training

CRediT authorship contribution statement

Export

Vol 107

P. 77-82 - septembre 2026 Retour au numéro

Article précédent

Comparison of tissue-mimicking materials to create a realistic and durable fascia iliaca block phantom
Heather A. Brown, Deborah M. Hurley, Laura Nolting

| Article suivant

Barriers to bystander CPR and AED use in out-of-hospital cardiac arrest: A narrative review
Lama A. Ammar, William C. Herring, Thomas G. Lederer, Ethan E. Abbott, Aditya C. Shekhar, Benjamin S. Abella

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Validation of a machine learning model for predicting early deterioration in the emergency department - 11/06/26

Abstract

Highlights

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL