The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation - 05/05/26

Doi : 10.1016/j.aicoj.2026.100078

Yu-Chang Yeh ^a,⁎ , Ming-Chieh Shih ^b, Daniel De Backer ^c, Leo Anthony Celi ^d, Kay Choong See ^e, Tomoko Fujii ^f, Lowell Ling ^g, Wasineenart Mongkolpun ^h, Hsiang-Wei Hu ⁱ, Hsuan-Yu Chen ^j, Wei-Cheng Chen ^k, Bernard Cholley ^l, Kean Khang Fong ^m, Ho-Geol Ryu ⁿ, Sungwon Na ^o, Moritoki Egi ^p, Wing-Sum Chan ^q, Kuan-Fu Chen ^r, Rishikesan Kamaleswaran ^s, Yu-Chen Chuang ^t, Chi-Ju Yang ^u, Wei-Ling Hsiao ^v, Sheng-Ru Lai ^w, Ku David ^x, Ahsina Jahan ^y, Greg S. Martin ^z,⁎

the IMPACT Group ^¹
Members of the IMPACT Group are listed in the Acknowledgment section.

^a Department of Anesthesiology, National Taiwan University Hospital, Taipei, Taiwan

^b School of Medicine, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan

^c Department of Intensive Care, CHIREC Hospitals, Université Libre de Bruxelles, Brussels, Belgium

^d Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, United States of America

^e Division of Respiratory and Critical Care Medicine, Department of Medicine, National University Hospital, Singapore

^f Department of Intensive Care, Jikei University Hospital, Tokyo, Japan

^g Department of Anaesthesia and Intensive Care, Chinese University of Hong Kong, Hong Kong SAR, China

^h Division of Critical Care Medicine, Department of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand

ⁱ Taiwan Artificial Intelligence Association, Taipei, Taiwan

^j Department of Orthopedic Surgery, National Taiwan University Hospital, Taipei, Taiwan

^k Respiratory Intensive Care Unit, China Medical University Hospital, Taichung, Taiwan

^l Department of Anesthesiology and Intensive Care Medicine, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France

^m Department of Medicine, Queen Elizabeth Hospital, Kota Kinabalu, Malaysia

ⁿ Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, Seoul, South Korea

^o Department of Anesthesiology and Pain Medicine, Severance Hospital, Seoul, South Korea

^p Department of Anesthesiology and Intensive Care Medicine, Kyoto University Hospital, Kyoto, Japan

^q Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei, Taiwan

^r Clinical Informatics and Medical Statistics Research Center, Chang Gung University, Taoyuan, Taiwan

^s Department of Surgery and Department of Anesthesiology, School of Medicine, Duke University, Durham, NC, United States of America

^t Information Technology Office, National Taiwan University Hospital, Taipei, Taiwan

^u Department of Pharmacy, National Taiwan University Hospital, Taipei, Taiwan

^v Department of Nursing, National Taiwan University Hospital, Taipei, Taiwan

^w Department of Dietetics, National Taiwan University Hospital, Taipei, Taiwan

^x Monash University, Melbourne, Australia

^y Department of Critical Care Medicine, Ashiyan Medical College Hospital, Dhaka, Bangladesh

^z Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, School of Medicine, Emory University, Atlanta, United States of America

^⁎ Corresponding authors.

connectez-vous ou créez un compte

Bienvenue sur EM-consulte, la référence des professionnels de santé.
Article gratuit.

Connectez-vous pour en bénéficier!

Sous presse. Manuscrit accepté. Disponible en ligne depuis le Tuesday 05 May 2026
Cet article a été publié dans un numéro de la revue, cliquez ici pour y accéder

Graphical abstract

Le texte complet de cet article est disponible en PDF.

Abstract

Background

Generative artificial intelligence (GenAI) is increasingly used for clinical decision support in critical care, yet standardized methods for evaluating GenAI content in intensive care settings are lacking. Existing metrics assess textual similarity but fail to capture clinical accuracy, reasoning quality, or urgency.

Methods

We developed and validated the IMPACT framework through a five-phase multinational panel consensus process. Reporting adhered to the ACCORD guideline. A steering committee of eight persons provided clinical and methodological oversight. Panelists were recruited through purposive sampling to ensure geographic and multidisciplinary representation. Content validity was assessed using the Content Validity Ratio (CVR) and Item-level Content Validity Index (I-CVI), with retention thresholds set at 70% agreement and I-CVI ≥ 0.80.

Results

A total of 58 panelists from 12 countries and regions participated, with 42 completing formal consensus voting. Participants included intensivists, physicians with AI research expertise, information technology specialists, and other critical care professionals. All six IMPACT domains exceeded validity thresholds (mean agreement 89.3%, CVR = 0.79, I-CVI = 0.92). Of 24 candidate subitems, 21 met retention criteria (mean agreement 85.7%, CVR = 0.71, I-CVI = 0.90). Three subitems were removed due to insufficient consensus and conceptual overlap. The validated framework comprises six domains with 21 subitems.

Conclusions

The IMPACT framework provides a consensus-validated approach for evaluating GenAI clinical decision support in intensive care, addressing gaps in current evaluation methods.

Le texte complet de cet article est disponible en PDF.

Keywords : Generative artificial intelligence, critical care, clinical decision support, content validity, consensus

Plan

Two of the co-authors are members of the editorial board of Annals of Intensive Care. Daniel De Backer serves as an Associate Editor, and Tomoko Fujii serves as an Associate Editor. Both were not involved in the editorial review or decision-making process for this manuscript.