The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation - 05/05/26
, Ming-Chieh Shih b, Daniel De Backer c, Leo Anthony Celi d, Kay Choong See e, Tomoko Fujii f, Lowell Ling g, Wasineenart Mongkolpun h, Hsiang-Wei Hu i, Hsuan-Yu Chen j, Wei-Cheng Chen k, Bernard Cholley l, Kean Khang Fong m, Ho-Geol Ryu n, Sungwon Na o, Moritoki Egi p, Wing-Sum Chan q, Kuan-Fu Chen r, Rishikesan Kamaleswaran s, Yu-Chen Chuang t, Chi-Ju Yang u, Wei-Ling Hsiao v, Sheng-Ru Lai w, Ku David x, Ahsina Jahan y, Greg S. Martin z, ⁎ 
the IMPACT Group 1
Cet article a été publié dans un numéro de la revue, cliquez ici pour y accéder
Graphical abstract |
Abstract |
Background |
Generative artificial intelligence (GenAI) is increasingly used for clinical decision support in critical care, yet standardized methods for evaluating GenAI content in intensive care settings are lacking. Existing metrics assess textual similarity but fail to capture clinical accuracy, reasoning quality, or urgency.
Methods |
We developed and validated the IMPACT framework through a five-phase multinational panel consensus process. Reporting adhered to the ACCORD guideline. A steering committee of eight persons provided clinical and methodological oversight. Panelists were recruited through purposive sampling to ensure geographic and multidisciplinary representation. Content validity was assessed using the Content Validity Ratio (CVR) and Item-level Content Validity Index (I-CVI), with retention thresholds set at 70% agreement and I-CVI ≥ 0.80.
Results |
A total of 58 panelists from 12 countries and regions participated, with 42 completing formal consensus voting. Participants included intensivists, physicians with AI research expertise, information technology specialists, and other critical care professionals. All six IMPACT domains exceeded validity thresholds (mean agreement 89.3%, CVR = 0.79, I-CVI = 0.92). Of 24 candidate subitems, 21 met retention criteria (mean agreement 85.7%, CVR = 0.71, I-CVI = 0.90). Three subitems were removed due to insufficient consensus and conceptual overlap. The validated framework comprises six domains with 21 subitems.
Conclusions |
The IMPACT framework provides a consensus-validated approach for evaluating GenAI clinical decision support in intensive care, addressing gaps in current evaluation methods.
Le texte complet de cet article est disponible en PDF.Keywords : Generative artificial intelligence, critical care, clinical decision support, content validity, consensus
Plan
| Two of the co-authors are members of the editorial board of Annals of Intensive Care. Daniel De Backer serves as an Associate Editor, and Tomoko Fujii serves as an Associate Editor. Both were not involved in the editorial review or decision-making process for this manuscript. |
Bienvenue sur EM-consulte, la référence des professionnels de santé.
