IMBALANCED HANDLING FOR STROKE PREDICTION USING OVERSAMPLING AND COST-SENSITIVE LEARNING

Agung L, Triyan - 237110017 (2025) IMBALANCED HANDLING FOR STROKE PREDICTION USING OVERSAMPLING AND COST-SENSITIVE LEARNING. Masters thesis, UNIVERSITAS TEKNOLOGI DIGITAL INDONESIA (UTDI).

[img] Text
1_237110017_HALAMAN_DEPAN.pdf - Published Version

Download (997kB)
[img] Text
2_237110017_BAB_I.pdf - Published Version

Download (182kB)
[img] Text
3_237110017_BAB_II.pdf - Published Version

Download (763kB)
[img] Text
4_237110017_BAB_III.pdf - Published Version

Download (381kB)

Abstract

Abstract Predictive analysis of stroke using machine learning (ML) is a promising approach for early detection and reducing the number of stroke patients. However, the inherent class imbalance in medical datasets poses a significant challenge, often causing models to fail to detect certain minority cases, such as stroke. This study aims to evaluate and compare two popular techniques for addressing class imbalance: oversampling using the Synthetic Minority Oversampling Technique (SMOTE) and costsensitive learning, within the context of stroke prediction. Using the public Kaggle stroke dataset, three ML algorithms (Random Forest, Support Vector Machine, and XGBoost) were trained and tested in three scenarios: baseline (without balancing), SMOTE, and cost-sensitive learning. The results show that both balancing techniques significantly improve recall for the minority class, particularly in the SVM model, but at the cost of reduced precision and accuracy across the entire model. Feature importance analysis using SHAP identified age and hypertension as the most important factors in predicting stroke, consistent with previous research findings. Despite these improvements, this study highlights the trade-off between sensitivity and precision, which must be considered for practical application in medical decision support systems. Future research should explore hybrid approaches and validate results on larger and more diverse datasets. Keywords Stroke prediction, class imbalance, SMOTE, cost-sensitive learning, machine learning, feature importance, SHAP

Item Type: Thesis (Masters)
Additional Information: Pembimbing : Dr. Widyastuti Andriyani, S.Kom., M.Kom
Uncontrolled Keywords: Keywords Stroke prediction, class imbalance, SMOTE, cost-sensitive learning, machine learning, feature importance, SHAP
Subjects: A Karya Umum (General) > Ilmu Komputer (Computer Science) > Analisis Sistem
A Karya Umum (General) > Ilmu Komputer (Computer Science) > E-Learning
A Karya Umum (General) > Ilmu Komputer (Computer Science) > Vector Machine
Divisions: Tesis (S2) > Teknologi Informasi (S2)
Depositing User: Mr. Andi Setyanto
Date Deposited: 12 Sep 2025 03:25
Last Modified: 12 Sep 2025 03:25
URI: http://eprints.utdi.ac.id/id/eprint/10880

Actions (login required)

View Item View Item