Abbonarsi

Comparison between multimodal foundation models and radiologists for the diagnosis of challenging neuroradiology cases with text and images - 07/10/25

Doi : 10.1016/j.diii.2025.04.006 
Bastien Le Guellec a, b, c, , Cyril Bruge d, Najib Chalhoub a, Victor Chaton e, Edouard De Sousa a, Yann Gaillandre a, Riyad Hanafi a, Matthieu Masy f, Quentin Vannod-Michel a, Aghiles Hamroun b, g, Grégory Kuchcinski a, c
on behalf of

ARIANES investigators

a Department of Neuroradiology, CHU Lille, Salengro Hospital, Lille 59000, France 
b Université Lille, INSERM, CHU Lille, Institut Pasteur de Lille, U1167-RID-AGE - Facteurs de Risque et Déterminants Moléculaires des Maladies Liées au Vieillissement, Lille 59000, France 
c INSERM, U1172–LilNCog-Lille Neuroscience & Cognition, Université de Lille, Lille 59000, France 
d Department of Radiology, Lens Hospital, Lens 62300, France 
e Department of Neuroradiology, Saint Philibert Hospital, Lille 59160, France 
f Department of Neuroradiology, Valenciennes Hospital, Valenciennes 59300, France 
g Public Health – Epidemiology Department, CHU Lille, Maison Régionale de la Recherche Clinique, Lille 59000, France 

Corresponding author.

Highlights

Multimodal models (GPT-4o and Gemini 1.5 Pro) outperform neuroradiologists in suggesting diagnoses from clinical context alone (34.0 % and 44.7 % vs. 16.4 %, respectively; P < 0.01).
Neuroradiologists outperform multimodal models (GPT-4o and Gemini 1.5 Pro) using images alone (42.0 % vs. 3.8 % and 7.5 %; P < 0.01) and images and text (48.0 % vs. 34.0 % and 38.7 %; P < 0.001).
The multimodal models have limitations in identifying abnormal findings, with frequent hallucinations, and fail to effectively integrate multimodal inputs.
Neuroradiologists improve their accuracy with the assistance of Gemini 1.5 Pro from 47.2 % to 56.0 % (P < 0.01).

Il testo completo di questo articolo è disponibile in PDF.

Abstract

Purpose

The purpose of this study was to compare the ability of two multimodal models (GPT-4o and Gemini 1.5 Pro) with that of radiologists to generate differential diagnoses from textual context alone, key images alone, or a combination of both using complex neuroradiology cases.

Materials and methods

This retrospective study included neuroradiology cases from the "Diagnosis Please" series published in the Radiology journal between January 2008 and September 2024. The two multimodal models were asked to provide three differential diagnoses from textual context alone, key images alone, or the complete case. Six board-certified neuroradiologists solved the cases in the same setting, randomly assigned to two groups: context alone first and images alone first. Three radiologists solved the cases without, and then with the assistance of Gemini 1.5 Pro. An independent radiologist evaluated the quality of the image descriptions provided by GPT-4o and Gemini for each case. Differences in correct answers between multimodal models and radiologists were analyzed using McNemar test.

Results

GPT-4o and Gemini 1.5 Pro outperformed radiologists using clinical context alone (mean accuracy, 34.0 % [18/53] and 44.7 % [23.7/53] vs. 16.4 % [8.7/53]; both P < 0.01). Radiologists outperformed GPT-4o and Gemini 1.5 Pro using images alone (mean accuracy, 42.0 % [22.3/53] vs. 3.8 % [2/53], and 7.5 % [4/53]; both P < 0.01) and the complete cases (48.0 % [25.6/53] vs. 34.0 % [18/53], and 38.7 % [20.3/53]; both P < 0.001). While radiologists improved their accuracy when combining multimodal information (from 42.1 % [22.3/53] for images alone to 50.3 % [26.7/53] for complete cases; P < 0.01), GPT-4o and Gemini 1.5 Pro did not benefit from the multimodal context (from 34.0 % [18/53] for text alone to 35.2 % [18.7/53] for complete cases for GPT-4o; P = 0.48, and from 44.7 % [23.7/53] to 42.8 % [22.7/53] for Gemini 1.5 Pro; P = 0.54). Radiologists benefited significantly from the suggestion of Gemini 1.5 Pro, increasing their accuracy from 47.2 % [25/53] to 56.0 % [27/53] (P < 0.01). Both GPT-4o and Gemini 1.5 Pro correctly identified the imaging modality in 53/53 (100 %) and 51/53 (96.2 %) cases, respectively, but frequently failed to identify key imaging findings (43/53 cases [81.1 %] with incorrect identification of key imaging findings for GPT-4o and 50/53 [94.3 %] for Gemini 1.5).

Conclusion

Radiologists show a specific ability to benefit from the integration of textual and visual information, whereas multimodal models mostly rely on the clinical context to suggest diagnoses.

Il testo completo di questo articolo è disponibile in PDF.

Keywords : Artificial intelligence, ChatGPT, Gemini, Large language models, Multimodal models

Abbreviations : CT, GPT, LLM, MRI, SD


Mappa


© 2025  Pubblicato da Elsevier Masson SAS.
Aggiungere alla mia biblioteca Togliere dalla mia biblioteca Stampare
Esportazione

    Citazioni Export

  • File

  • Contenuto

Vol 106 - N° 10

P. 345-352 - ottobre 2025 Ritorno al numero
Articolo precedente Articolo precedente
  • 3D post-processing in postmortem forensic imaging: Techniques, applications, and future directions
  • Nicolas Douis, Elodie Marchand, Gwendoline Wary, Romain Gillet, Martin Kolopp, Alain Blum, Laurent Martrille
| Articolo seguente Articolo seguente
  • The continued industrial use of ethylene oxide despite health risks: An interventional radiology perspective on a broader medical challenge of balancing sterility, safety, and sustainability
  • Julien Ognard, Marie Schmidt, Christophe Paya, Isabelle Le Du, Camille Bourillon, Mourad Cheddad El Aouni, Ludwig Serge Aho Glele, Riitta Rautio, Sara Lojo-Lendoiro, Omer F. Eker, Vanessa Brun, Xianli Lv, Erwan L’Her, Raphaël Tripier, Abdelghani Fakhreddine Boustia, Betty Jean, Jean-Christophe Gentric, Cyril Leven, Douraied Ben Salem

Benvenuto su EM|consulte, il riferimento dei professionisti della salute.
L'accesso al testo integrale di questo articolo richiede un abbonamento.

Già abbonato a @@106933@@ rivista ?

@@150455@@ Voir plus

Il mio account


Dichiarazione CNIL

EM-CONSULTE.COM è registrato presso la CNIL, dichiarazione n. 1286925.

Ai sensi della legge n. 78-17 del 6 gennaio 1978 sull'informatica, sui file e sulle libertà, Lei puo' esercitare i diritti di opposizione (art.26 della legge), di accesso (art.34 a 38 Legge), e di rettifica (art.36 della legge) per i dati che La riguardano. Lei puo' cosi chiedere che siano rettificati, compeltati, chiariti, aggiornati o cancellati i suoi dati personali inesati, incompleti, equivoci, obsoleti o la cui raccolta o di uso o di conservazione sono vietati.
Le informazioni relative ai visitatori del nostro sito, compresa la loro identità, sono confidenziali.
Il responsabile del sito si impegna sull'onore a rispettare le condizioni legali di confidenzialità applicabili in Francia e a non divulgare tali informazioni a terzi.


Tutto il contenuto di questo sito: Copyright © 2026 Elsevier, i suoi licenziatari e contributori. Tutti i diritti sono riservati. Inclusi diritti per estrazione di testo e di dati, addestramento dell’intelligenza artificiale, e tecnologie simili. Per tutto il contenuto ‘open access’ sono applicati i termini della licenza Creative Commons.