Optimizing Diabetes Prediction: An Evaluation of Machine Learning Models Through Strategic Feature Selection
DOI:
https://doi.org/10.59796/jcst.V15N1.2025.75Keywords:
diabetes diagnosis, machine learning, feature selection, complexity-reduced model, intelligent diagnostic softwareAbstract
Diabetes, a widespread chronic ailment in the United States, imposes significant economic and health burdens, impacting quality of life and life expectancy. This study analyzes a clinical dataset of 253,680 patients from the Behavioral Risk Factor Surveillance System (BRFSS). The dataset encompasses 21 predictors, including high blood pressure, cholesterol, body mass index (BMI), smoking, stroke, heart disease, physical activity, fruit consumption, vegetable consumption, alcohol consumption, insurance coverage, lack of medical visits due to financial constraints, general health, days with mental health issues, days with physical injuries in the past 30 days, difficulties in walking, gender, age, income, and education level. The objective is to balance the training dataset, compare different supervised machine learning models, and identify critical clinical features contributing to diabetes using unsupervised feature selection methods. A total of 34 machine learning models in MATLAB2023a were trained and compared. Quadratic Support Vector Machine (SVM), Coarse Gaussian SVM, and Narrow Neural Networks achieved the highest training accuracy (76.3%), while the Bilayered Neural Network attained 74.7% on an unseen test dataset. Among all, Quadratic SVM demonstrated the best overall performance based on average accuracy, precision, recall, and F1 score. Feature selection highlighted nine key predictors: high blood pressure, high cholesterol, BMI, heart disease, physical activity, general health, recent bodily injuries, mobility issues, and age. A model trained on these features achieved a commendable accuracy of 75.4%, demonstrating the feasibility of a simplified, efficient diagnostic tool with a diagnostic efficacy of 0.7.
This study underscores the potential of streamlined models to predict diabetes with fewer parameters while maintaining high accuracy, offering a valuable tool for healthcare diagnostics.
References
Anupongongarch, P., Kaewgun, T., O'Reilly, J. A., &Suraamornkul, S. (2022). Design and construction of a non-invasive blood glucose and heart rate meter by photoplethysmography. Journal of Current Science and Technology, 12(1), 89-101. https://ph04.tci-thaijo.org/index.php/JCST/article/view/327
Ahmed, A. M. (2002). History of diabetes mellitus. Saudi Medical Journal, 23(4), 373-378. https://www.researchgate.net/publication/336666069_History_of_Diabetes_Mellitus
Ahmed, U., Issa, G. F., Khan, M. A., Aftab, S., Khan, M. F., Said, R. A., ... & Ahmad, M. (2022). Prediction of diabetes empowered with fused machine learning. IEEE Access, 10, 8529–8538. https://doi.org/10.1109/access.2022.3142097
Almahdawi, A., Naama, Z. S., & Al-Taie, A. (2022, December 27-28). Diabetes Prediction Using Machine Learning [Conference presentation]. 2022 3rd Information Technology To Enhance e-learning and Other Application (IT-ELA), IEEE, Baghdad, Iraq. https://doi.org/10.1109/IT-ELA57378.2022.10107919
Banday, M. Z., Sameer, A. S., & Nissar, S. (2020). Pathophysiology of diabetes: An overview. Avicenna Journal of Medicine, 10(4), 174-188. https://doi.org/10.4103%2Fajm.ajm_53_20
Guan, Z., Li, H., Liu, R., Cai, C., Liu, Y., Li, J., ... & Sheng, B. (2023). Artificial intelligence in diabetes management: advancements, opportunities, and challenges. Cell Reports Medicine, 4(10), Article 101213. https://doi.org/10.1016/j.xcrm.2023.101213
Hart, P. A., Bellin, M. D., Andersen, D. K., Bradley, D., Cruz-Monserrate, Z., Forsmark, C. E., ... & Chari, S. T. (2016). Type 3c (pancreatogenic) diabetes mellitus secondary to chronic pancreatitis and pancreatic cancer. The Lancet Gastroenterology & Hepatology, 1(3), 226-237. https://doi.org/10.1016/s2468-1253(16)30106-6
Khanam, J., & Foo, S. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7, 432-439. https://doi.org/10.1016/J.ICTE.2021.02.004.
Lu, D., Tao, A., Zeng, T., & Shalaginov, M. (2023, July 19-21). Machine Learning Techniques for Early Prediction of Diabetes on Multiple Datasets [Conference presentation]. 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), 1-5. Canary Islands, Spain. https://doi.org/10.1109/ICECCME57830.2023.10252198.
Lyngdoh, A., Choudhury, N., & Moulik, S. (2021, March 1-3). Diabetes Disease Prediction Using Machine Learning Algorithms [Conference presentation]. 2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Langkawi Island, Malaysia. https://doi.org/10.1109/IECBES48179.2021.9398759
Manosroi, W., Phimphilai, M., Waisayanand,N., Buranapin, S., Deerochanawong, C., Gunaparn, S., Phrommintikul, A. (2023). Glycated hemoglobin variability and the risk of cardiovascular events in patients with prediabetes and type 2 diabetes mellitus: A post-hoc analysis of a prospective and multicenter study. Journal of Diabetes Investigation, 14(12), 1391-1400. https://doi.org/10.1111/jdi.14073
Niramitmahapanya, S., Chattieng, P., Nasomphan. T., Sathirakul, K. (2023). Effects of dietary supplementation on progression to type 2 diabetes in subjects with prediabetes: a single center randomized double-blind placebo-controlled trial. Annals of Clinical Endocrinology and Metabolism, 7, 001-007.
Panda, N. R., Mohanty, J. N., Bhuyan, R., Raut, P. K., & Manulata. (2024). Exploring machine learning approaches for early diabetes risk prediction: A comprehensive examination of health indicators and models. Journal of Associated Medical Sciences, 57(3), 155–165. Retrieved from https://he01.tci-thaijo.org/index.php/bulletinAMS/article/view/271446
Popoviciu, M. S., Kaka, N., Sethi, Y., Patel, N., Chopra, H., & Cavalu, S. (2023). Type 1 Diabetes Mellitus and Autoimmune Diseases: A Critical Review of the Association and the Application of Personalized Medicine. Journal of Personalized Medicine, 13(3), Article 422. https://doi.org/10.3390/jpm13030422
Sims, E. K., Carr, A. L., Oram, R. A., DiMeglio, L. A., & Evans-Molina, C. (2021). 100 years of insulin: celebrating the past, present and future of diabetes therapy. Nature Medicine, 27(7), 1154-1164. https://doi.org/10.1038/s41591-021-01418-2
Sugandh, F. N. U., Chandio, M., Raveena, F. N. U., Kumar, L., Karishma, F. N. U., Khuwaja, S., ... & Kumar, S. (2023). Advances in the management of diabetes mellitus: a focus on personalized medicine. Cureus, 15(8), Article e43697. https://doi.org/10.7759%2Fcureus.43697
Teboul, A., (2022). Diabetes Health Indicators Dataset. Kaggle, Retrieved October 15, 2023 from https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/data
Xie ZiDian, X. Z., Nikolayeva, O., Luo JieBo, L. J., & Li DongMei, L. D. (2019). Building risk prediction models for type 2 diabetes using machine learning techniques. Preventing Chronic Disease, 19(16), Article e130. https://doi.org/10.5888/pcd16.190109
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 Journal of Current Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.