Validation of a machine learning model for predicting early deterioration in the emergency department - 11/06/26

Abstract |
Early recognition of patients at risk for deterioration in the emergency department (ED) is critical for patient safety. Traditional early warning scores rely on structured triage data and often perform poorly in the dynamic ED environment. We developed and evaluated two machine learning models integrating structured triage data with transformer-based embeddings of free-text nursing triage notes to predict early clinical deterioration prior to initial physician assessment, designed as a risk-based prioritization tool to rank patients by predicted probability of adverse outcome.
We analyzed 17,481 consecutive adult ED visits over six months. Structured variables (demographics, vital signs, eCTAS scores) were combined with BioClinicalBERT-derived embeddings from free-text nursing triage notes to form a multimodal feature representation. Two XGBoost models (A, B) were trained on the same binary classification task, predicting “early deterioration” (ICU admission or death within 7 days, prevalence 4.5%) versus all other outcomes, differing only in class weighting. Model A used standard class weighting; Model B applied increased weighting to the early deterioration class to prioritize identification of high-risk patients.
Model A achieved a recall of 0.66 (95% CI: 0.59–0.73), precision of 0.17 (95% CI: 0.15–0.20), and ROC-AUC of 0.75 (95% CI: 0.72–0.79). Model B improved recall to 0.77 (95% CI: 0.72–0.84), precision to 0.22 (95% CI: 0.19–0.25), and ROC-AUC to 0.90 (95% CI: 0.88–0.92). While XGBoost's internal feature importance attributed the majority of predictive weight to free-text embeddings, SHAP analysis identified age, respiratory rate, and systolic blood pressure as the dominant individual contributors, with triage note embeddings providing meaningful incremental value confirmed by structured-variable ablation.
These findings suggest that AI-driven risk prioritization may function as an adjunct layer of situational awareness in the ED, complementing clinical judgement rather than replacing it. Safe clinical adoption will require prospective shadow testing in real-time workflows to quantify ranking accuracy, assess operational feasibility, and evaluate impact on decision-making before any clinician-facing implementation.
Le texte complet de cet article est disponible en PDF.Highlights |
• | Multimodal AI predicts ED deterioration using triage notes and structured EHR data. |
• | Weighted XGBoost model achieves 0.90 ROC-AUC and 0.77 recall for severe outcomes. |
• | Clinical text embeddings provide an 8% increase in overall predictive accuracy. |
• | Tool supports situational awareness by prioritizing occult high-risk ED patients. |
• | SHAP analysis confirms age and vitals as dominant predictors of deterioration. |
Keywords : Emergency department, Early deterioration, Machine learning, Predictive modeling, Clinical decision support, Risk stratification
Plan
Vol 107
P. 77-82 - septembre 2026 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?
