Optimizing Heart Disease Prediction Models through SMOTE: Addressing Data Imbalance
تحسين نماذج التنبؤ بأمراض القلب باستخدام SMOTE: معالجة اختلال توازن البيانات
2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA) · 2024 · pp. 1–10
Abstract
The problem of data imbalance poses a significant challenge in the field of medical diagnostics, particularly in heart disease prediction using machine learning models. This study investigates the application of the Synthetic Minority Over-sampling Technique (SMOTE) to address this imbalance and improve the predictive accuracy of heart disease models. Through a rigorous methodology involving data collection, preprocessing, and the evaluation of 21 machine learning models, the study compares the performance of models trained on both original and SMOTE-balanced datasets. The findings indicate that models trained on SMOTE-balanced datasets showed statistically significant improvements in precision, F1-score, and often recall, with varied impacts on accuracy depending on the dataset. Key performance metrics such as accuracy, precision, recall, and F1-score are analyzed using t-tests to assess the statistical significance of improvements offered by SMOTE. Notably, the enhancements were particularly evident in precision and F1-score across the Cleveland and Statlog heart disease datasets, demonstrating SMOTE’s ability to enhance model performance effectively. This study aims to demonstrate the potential of SMOTE in enhancing model performance, thus contributing to more effective and reliable heart disease diagnostics.