External validation of a commercially available deep learning algorithm for fracture detection in children - 20/11/21
Cet article a été publié dans un numéro de la revue, cliquez ici pour y accéder
Highlights |
• | Deep learning algorithms lack real-world external validation prior to clinical use. |
• | The tested deep learning algorithm shows strong diagnostic performance in children. |
• | Sensitivity of the tested algorithm is lower in children under 4 years. |
Abstract |
Purpose |
The purpose of this study was to conduct an external validation of a fracture assessment deep learning algorithm (Rayvolve®) using digital radiographs from a real-life cohort of children presenting routinely to the emergency room.
Materials and methods |
This retrospective study was conducted on 2634 radiography sets (5865 images) from 2549 children (1459 boys, 1090 girls; mean age, 8.5 ± 4.5 [SD] years; age range: 0–17 years) referred by the pediatric emergency room for trauma. For each set was recorded whether one or more fractures were found, the number of fractures, and their location found by the senior radiologists and the algorithm. Using the senior radiologist diagnosis as the standard of reference, the diagnostic performance of deep learning algorithm (Rayvolve®) was calculated via three different approaches: a detection approach (presence/absence of a fracture as a binary variable), an enumeration approach (exact number of fractures detected) and a localization approach (focusing on whether the detected fractures were correctly localized). Subgroup analyses were performed according to the presence of a cast or not, age category (0–4 vs. 5–18 years) and anatomical region.
Results |
Regarding detection approach, the deep learning algorithm yielded 95.7% sensitivity (95% CI: 94.0–96.9), 91.2% specificity (95% CI: 89.8–92.5) and 92.6% accuracy (95% CI: 91.5–93.6). Regarding enumeration and localization approaches, the deep learning algorithm yielded 94.1% sensitivity (95% CI: 92.1–95.6), 88.8% specificity (95% CI: 87.3–90.2) and 90.4% accuracy (95% CI: 89.2–91.5) for both approaches. Regarding age-related subgroup analyses, the deep learning algorithm yielded greater sensitivity and negative predictive value in the 5–18-years age group than in the 0–4-years age group for the detection approach (P < 0.001 and P = 0.002) and for the enumeration and localization approaches (P = 0.012 and P = 0.028). The high negative predictive value was robust, persisting in all of the subgroup analyses, except for patients with casts (P = 0.001 for the detection approach and P < 0.001 for the enumeration and localization approaches).
Conclusion |
The Rayvolve® deep learning algorithm is very reliable for detecting fractures in children, especially in those older than 4 years and without cast.
Le texte complet de cet article est disponible en PDF.Keywords : Radiographs, Deep learning algorithm, Artificial intelligence, Fractures, Pediatric
Abbreviations : CI, DICOM, FN, FP, LR-, LR+, NPV, PACS, PPV, ROI, TN, TP
Plan
Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Déjà abonné à cette revue ?