Enhancing prediction models' performance for breast cancer using SMOTE technique
تعزيز أداء نماذج التنبؤ بسرطان الثدي باستخدام تقنية SMOTE
2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA) · 2023 · pp. 1–8
Abstract
Breast cancer (BC) is a critical public health concern, and the development of accurate prediction models is crucial for early detection. However, predicting BC using imbalanced datasets poses challenges for achieving accurate predictions. This study aims to enhance the performance of BC prediction models by employing the Synthetic Minority Over-sampling Technique (SMOTE) to address the imbalance in the target class of the dataset. Two approaches are employed to evaluate the models: the first approach utilizes the original Breast Cancer Coimbra Dataset (BCCD), while the second approach utilizes SMOTE to balance the target class in the BCCD. The results of the performance comparison between the two approaches demonstrate that the utilization of SMOTE significantly improves the performance of the BC prediction models. For instance, the Fine Tree, Coarse Tree, and Medium Tree models achieved accuracy rates of 60.9%, 52.2%, and 60.9%, respectively, with the SMOTE implementation. The Quadratic SVM and Cubic SVM models achieved accuracy rates of 73.9% with SMOTE. The Fine Gaussian SVM model achieved accuracy rates of 65.2 % and 80% without and with SMOTE, respectively. Similarly, the Coarse Gaussian SVM model achieved accuracy rates of 52.2% and 60% without and with SMOTE, respectively. The Medium KNN and Weighted KNN models both achieved accuracy rates of 73.9% without SMOTE and 76% with SMOTE. Furthermore, Bagged Trees achieved accuracy rates of 69.6% without SMOTE and 80% with SMOTE, while Subspace Discriminant achieved accuracy rates of 73.9% without SMOTE and 80% with SMOTE. The Optimized LogitBoost model achieved accuracy rates of 73.9% without SMOTE and 88% with SMOTE, and AdaBoost using Bayesian Optimization achieved accuracy rates of 52.2% without SMOTE and 76% with SMOTE. This study demonstrated that implementing SMOTE to balance the dataset leads to improved accuracy in BC prediction models.