Chatbots vs andrologists: testing 25 clinical cases - 08/04/24

Doi : 10.1016/j.fjurol.2024.102636

Ophélie Perrot ^a,⁎ , Aurelie Schirmann ^a, Adrien Vidart ^a, Cyrille Guillot-Tantay ^a, Vincent Izard ^a, Thierry Lebret ^a, Bernard Boillot ^a, Benoit Mesnard ^a, Cedric Lebacle ^b, François-Xavier Madec ^a
^a Foch Hospital, Urology department, Suresnes, France
^b Kremlin-Bicetre Hospital, urology department, Kremlin-Bicetre, France

^⁎Corresponding author: 50 rue Delalain, 94700 Maisons-Alfort, France50 rue DelalainMaisons-Alfort94700France

Sous presse. Manuscrit accepté. Disponible en ligne depuis le Monday 08 April 2024

Abstract

Objective: AI-derived language models are booming, and their place in medicine is undefined. The aim of our study is to compare responses to andrology clinical cases, between chatbots and andrologists, to assess the reliability of these technologies.

Material and method: We analyzed the responses of 32 experts, 18 residents and three chatbots (ChatGPT v3.5, v4 and Bard) to 25 andrology clinical cases. Responses were assessed on a Likert scale ranging from 0 to 2 for each question (0-false response or no response; 1-partially correct response, 2- correct response), on the basis of the latest national or, in the absence of such, international recommendations. We compared the averages obtained for all cases by the different groups.

Results: Experts obtained a higher mean score (m=11/12.4 σ=1.4) than ChatGPT v4 (m=10.7/12.4 σ=2.2, p = 0.6475), ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p = 0.0062) and Bard (m=7.2/12.4 σ=3.3, p < 0.0001). Residents obtained a mean score (m=9.4/12.4 σ=1.7) higher than Bard (m=7.2/12.4 σ=3.3, p = 0.0053) but lower than ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p = 0.8393) and v4 (m=10.7/12.4 σ=2.2, p = 0.0183) and experts (m=11.0/12.4 σ=1.4,p=0.0009). ChatGPT v4 performance (m=10.7 σ=2.2) was better than ChatGPT v3.5 (m=9.5, σ=2.1, p = 0.0476) and Bard performance (m=7.2 σ=3.3, p < 0.0001).

Conclusion: The use of chatbots in medicine could be relevant. More studies are needed to integrate them into clinical practice.

Level of evidence 4

Le texte complet de cet article est disponible en PDF.

Keywords : Artificial intelligence, andrology, clinical reasoning, natural language processing

Export

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

Plateformes Elsevier Masson

Déclaration CNIL

EM-CONSULTE.COM est déclaré à la CNIL, déclaration n° 1286925.

En application de la loi nº78-17 du 6 janvier 1978 relative à l'informatique, aux fichiers et aux libertés, vous disposez des droits d'opposition (art.26 de la loi), d'accès (art.34 à 38 de la loi), et de rectification (art.36 de la loi) des données vous concernant. Ainsi, vous pouvez exiger que soient rectifiées, complétées, clarifiées, mises à jour ou effacées les informations vous concernant qui sont inexactes, incomplètes, équivoques, périmées ou dont la collecte ou l'utilisation ou la conservation est interdite.
Les informations personnelles concernant les visiteurs de notre site, y compris leur identité, sont confidentielles.
Le responsable du site s'engage sur l'honneur à respecter les conditions légales de confidentialité applicables en France et à ne pas divulguer ces informations à des tiers.

Chatbots vs andrologists: testing 25 clinical cases - 08/04/24

Abstract

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL