Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets - 24/06/25

Abstract |
Background: |
Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.
Method: |
This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.
Results: |
Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.
Conclusions: |
The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.
Le texte complet de cet article est disponible en PDF.Graphical abstract |
Highlights |
• | Depiction of accuracy and weighted measures as inefficient evaluation metrics for the imbalanced stroke prediction dataset. |
• | Assessment of multiple ML models without oversampling while adopting recall as the proper evaluation metric. |
• | Integration of XAI through the use of SHAP and a Flutter based mobile application using the best performing models. |
Keywords : Brain stroke prediction, Data imbalance, Recall, SHAP
Plan
Vol 5 - N° 3
Article 100215- septembre 2025 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
