Abstract
Heart disease is a leading cause of mortality worldwide and its rising prevalence challenges health systems. This study evaluates Decision Tree, k Nearest Neighbors, and Random Forest using the Heart Failure Prediction Dataset from Kaggle with 918 records and 12 demographic, clinical, and lifestyle features. The target variable indicates the presence of heart disease. Data preprocessing included cleaning, transformation, and scaling. Hyperparameters were tuned with stratified five fold cross validation to prevent data leakage. Performance was assessed using accuracy, precision, recall, F1 score, ROC AUC, PR AUC, Matthews Correlation Coefficient, and Brier score each estimated with 95 percent confidence intervals via bootstrap. k Nearest Neighbors achieved the highest accuracy at 90.2 percent, followed by Random Forest at 87.5 percent and Decision Tree at 85.3 percent. Calibration and decision curve analyses indicated that k Nearest Neighbors and Random Forest provided better calibrated probabilities and higher clinical utility across plausible thresholds. The study offers a reproducible evaluation pipeline and supports the use of machine learning for early detection of heart disease while encouraging future work on larger datasets and more advanced models.
Kelvin Leonardi Kohsasih
STMIK TIME
Daniel Smith Sunario
STMIK TIME
Alvin Alvin
STMIK TIME
Fedro Laurendio
STMIK TIME
- V. L. Roger, “Epidemiology of Heart Failure: A Contemporary Perspective,” Circ Res, vol. 128, no. 10, pp. 1421–1434, May 2021, doi: 10.1161/CIRCRESAHA.121.318172.
- A. Singh, H. Mahapatra, A. K. Biswal, M. Mahapatra, D. Singh, and M. Samantaray, “Heart Disease Detection Using Machine Learning Models,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 937–947. doi: 10.1016/j.procs.2024.04.089.
- P. M. Seferović et al., “The Heart Failure Association Atlas: Heart Failure Epidemiology and Management Statistics 2019,” Eur J Heart Fail, vol. 23, no. 6, pp. 906–914, Jun. 2021, doi: 10.1002/ejhf.2143.
- A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Inform Med Unlocked, vol. 19, Jan. 2020, doi: 10.1016/j.imu.2020.100330.
- S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
- G. Alwakid, F. Ul Haq, N. Tariq, M. Humayun, M. Shaheen, and M. Alsadun, “Optimized machine learning framework for cardiovascular disease diagnosis: a novel ethical perspective,” BMC Cardiovasc Disord, vol. 25, no. 1, Dec. 2025, doi: 10.1186/s12872-025-04550-w.
- T. Mahmood, A. Rehman, T. Saba, L. Nadeem, and S. A. O. Bahaj, “Recent Advancements and Future Prospects in Active Deep Learning for Medical Image Segmentation and Classification,” IEEE Access, vol. 11, pp. 113623–113652, Sep. 2023, doi: 10.1109/ACCESS.2023.3313977.
- M. G. Hanna et al., “Future of Artificial Intelligence—Machine Learning Trends in Pathology and Medicine,” Modern Pathology, vol. 38, no. 4, p. 100705, Apr. 2025, doi: 10.1016/j.modpat.2025.100705.
- K. M. Shiwangi, J. K. Sandhu, and R. Sahu, “Effective Heart-Disease Prediction by Using Hybrid Machine Learning Technique,” Proceedings of the International Conference on Circuit Power and Computing Technologies, ICCPCT 2023, pp. 1670–1675, 2023, doi: 10.1109/ICCPCT58313.2023.10245785.
- L. Yang et al., “Study of cardiovascular disease prediction model based on random forest in eastern China,” Sci Rep, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-62133-5.
- O. P. Barus, K. Lauwren, J. J. Pangaribuan, and Romindo, “Implementation of the Naive Bayes Algorithm to Predict the Safety of Heart Failure Patients,” Conference Series, vol. 4, no. 1, pp. 172–177, Dec. 2023, doi: 10.34306/conferenceseries.v4i1.651.
- M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthcare Analytics, vol. 3, Nov. 2023, doi: 10.1016/j.health.2022.100130.
- P. C. C. Clarite and I. V. G. V. G. Palma, “Heart Disease Prediction Using Decision Tree Analysis,” Journal of Engineering and Science Application, vol. 1, no. 2, pp. 26–32, Oct. 2024, doi: 10.69693/jesa.v1i2.9.
- fedesoriano, “Stroke Prediction Dataset,” Kaggle. Accessed: Dec. 16, 2021. [Online]. Available: https://www.kaggle.com/fedesoriano/stroke-prediction-dataset/metadata
- K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.
- P. Martins, F. Cardoso, P. Váz, J. Silva, and M. Abbasi, “Performance and Scalability of Data Cleaning and Preprocessing Tools: A Benchmark on Large Real-World Datasets,” Data (Basel), vol. 10, no. 5, p. 68, May 2025, doi: 10.3390/data10050068.
- H. J. Park, Y. S. Koo, H. Y. Yang, Y. S. Han, and C. S. Nam, “Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes,” Sensors, vol. 24, no. 17, Sep. 2024, doi: 10.3390/s24175461.
- Everleen Nekesa Wanyonyi and Newton Wafula Masinde, “The Impact of Data Preprocessing on Machine Learning Model Performance: A Comprehensive Examination,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 11, no. 2, pp. 3814–3827, Apr. 2025, doi: 10.32628/CSEIT25112854.
- H. A. Abdulqader and A. M. Abdulazeez, “Review on Decision Tree Algorithm in Healthcare Applications,” Indonesian Journal of Computer Science, vol. 13, no. 3, Jun. 2024, doi: 10.33022/ijcs.v13i3.4026.
- D. B. Olawade, A. A. Soladoye, B. A. Omodunbi, N. Aderinto, and I. A. Adeyanju, “Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection,” Int J Cardiol, vol. 436, Oct. 2025, doi: 10.1016/j.ijcard.2025.133443.
- A. A. Soladoye, N. Aderinto, B. A. Omodunbi, A. O. Esan, I. A. Adeyanju, and D. B. Olawade, “Enhancing Alzheimer’s disease prediction using random forest: A novel framework combining backward feature elimination and ant colony optimization,” Curr Res Transl Med, vol. 73, no. 4, Dec. 2025, doi: 10.1016/j.retram.2025.103526.
- I. D. Mienye and N. Jere, “A Survey of Decision Trees: Concepts, Algorithms, and Applications,” IEEE Access, vol. 12, pp. 86716–86727, 2024, doi: 10.1109/ACCESS.2024.3416838.
- H. Blockeel, L. Devos, B. Frénay, G. Nanfack, and S. Nijssen, “Decision trees: from efficient prediction to responsible AI,” 2023, Frontiers Media SA. doi: 10.3389/frai.2023.1124553.
- E. F. Siddiqui, T. Ahmed, and S. K. Nayak, “A decision tree approach for enhancing real-time response in exigent healthcare unit using edge computing,” Measurement: Sensors, vol. 32, p. 100979, Apr. 2024, doi: 10.1016/j.measen.2023.100979.
- S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-10358-x.
- K. L. Kohsasih, “COMPARISON OF SVM, KNN, AND NAÏVE BAYES ALGORITHMS IN MONKEYPOX DISEASE CLASSIFICATION,” TAMIKA: Jurnal Tugas Akhir Manajemen Informatika & Komputerisasi Akuntansi, vol. 4, no. 2, pp. 168–174, Dec. 2024, doi: 10.46880/tamika.Vol4No2(SEMNASTIK).pp168-174.
- H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/BJML/2024/007.
- K. Jurczuk, M. Czajkowski, and M. Kretowski, “From Random Forest to an interpretable decision tree - An evolutionary approach,” in Proceedings of the Companion Conference on Genetic and Evolutionary Computation, New York, NY, USA: ACM, Jul. 2023, pp. 291–294. doi: 10.1145/3583133.3590732.
- S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci Rep, vol. 12, no. 1, Dec. 2022, doi: 10.1038/s41598-022-09954-8.
- O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-56706-x.
- B. Kocak et al., “Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations,” European Journal of Radiology Artificial Intelligence, vol. 3, p. 100030, Sep. 2025, doi: 10.1016/j.ejrai.2025.100030.
Citation:
Kohsasih, K. L., Smith Sunario, D., Alvin, A., & Laurendio, F. (2025). Enhancing Early Heart Disease Detection Through Comparative Analysis of Random Forest, Decision Tree, and K-NN Models . IT Journal Research and Development, 10(2), 66–77. https://doi.org/10.25299/itjrd.2025.24703
Publication:
Vol. 10 No. 2 (2025): ITJRD: Journal Research and Development
DOI:
https://doi.org/10.25299/itjrd.2025.24703
Copyright:
Copyright (c) 2025 Kelvin Leonardi Kohsasih, Daniel Smith Sunario, Alvin Alvin, Fedro Laurendio