Are answers obtained from artificial intelligence models for information purposes repeatable? - 04/10/25

Highlights |
• | The repeatability of orthodontic responses generated by large language models (LLMs) over time is of significant importance. |
• | While ChatGPT-3.5 demonstrated the highest level of consistency, the Gemini models exhibited moderate repeatability. |
• | The temporal variability in model performance underscores the need for caution when utilizing AI tools in patient communication. |
Summary |
Introduction |
The objective of this study was to assess the repeatability of orthodontic responses generated by multiple large language models across repeated time points.
Methods |
This experimental study assessed the answers provided by ChatGPT-3.5, ChatGPT-4.0, Gemini, and Gemini-Advanced to 40 frequently asked orthodontic questions. Each model was prompted with the same questions at three time points (T0: day 0, T1: day 7, and T2: day 14). Two blinded orthodontic experts independently evaluated responses using a 3-point accuracy scale. Cohen's Kappa and ICC were applied to assess inter-rater agreement and repeatability, respectively. In addition, Friedman test with Bonferroni post-hoc analysis and Spearman correlation were used for temporal comparisons.
Results |
Cohen's Kappa values between raters ranged from 0.624 to 0.749, indicating substantial inter-rater agreement. ICC values for repeatability ranged from 0.666 (Gemini) to 0.960 (ChatGPT-3.5). Friedman test results revealed significant differences in model accuracy at T0 and T2 (P<0.001). Post-hoc analysis showed ChatGPT-3.5 differed significantly from Gemini and Gemini Advanced. Spearman correlations between time points were positive but weak (ρ=0.284 to 0.383, P<0.001).
Conclusions |
The study revealed statistically significant differences in repeatability among AI models. Despite high accuracy, some models exhibited limited consistency over time. These findings underscore the importance of evaluating both accuracy and temporal stability when integrating AI systems into clinical orthodontic communication.
Le texte complet de cet article est disponible en PDF.Keywords : Large language models, Acquiring knowledge, Repeatable
Plan
Vol 24 - N° 1
Article 101071- mars 2026 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?
