Can artificial intelligence accurately detect and summarize anatomy education literature? A comparative analysis of ChatGPT and ScholarGPT - 27/11/25
, S. Kanakaris c, M. Piagkou c, I. Chryssanthou c, A.V. Vasiliadis d, K. Natsis eHighlights |
• | ChatGPT and ScholarGPT correctly detected relevant anatomy education studies only at the first and simplest level of questions complexity. |
• | At the next two levels, ScholarGPT performed slightly better, but still did not surpass 50% in terms of accuracy. |
• | Regarding most of the relevant studies, the summaries lacked important information, while it seemed that there was bias in favor of the use of the educational intervention. |
• | ChatGPT and ScholarGPT are not currently at an adequate level to essentially aid researchers to detect and summarize studies of the anatomy education literature. |
Summary |
Purpose |
Artificial intelligence platforms have been suggested as tools that can facilitate anatomy teachers’ work and students’ learning process. We aimed to investigate the ability of ChatGPT to detect and summarize studies of the anatomy education literature compared to ScholarGPT, a version of ChatGPT specified in academic research. Secondly, we aimed to explore if the ability of each platform is influenced by the level of queries complexity.
Methods |
We asked the two platforms to list five studies about each of the following three topics: (1) use of virtual reality in anatomy education, (2) use of stereoscopic virtual reality in anatomy education, (3) use of stereoscopic virtual reality in anatomy education, involving user's interaction with the virtual environment. We assessed if the retrieved studies fulfilled the search criteria, and if their summaries were accurate (if they contained true information about all the educational results of the article's abstract).
Results |
The ChatGPT's percentages of successful detection were 100%, 60% and 0% respectively for the three queries. The percentages of accurate summaries were 60%, 20% and 0% respectively. ScholarGPT performed better, with a percentage of successful detection 100%, 60% and 40% respectively. The percentages of accurate summaries were 80%, 60% and 40% respectively. Both platforms showed bias in favor of the educational intervention.
Conclusions |
ChatGPT and ScholarGPT are not currently at an adequate level to essentially aid researchers to detect and summarize studies of the anatomy education literature. Ongoing research may increase the ability of those platforms to provide more reliable information.
Le texte complet de cet article est disponible en PDF.Keywords : ChatGPT, ScholarGPT, Artificial intelligence, Virtual reality, Anatomy, Anatomy education
Plan
Vol 109 - N° 367
Article 101061- décembre 2025 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?
