A Data-Driven Framework for Diabetes Prediction: Machine Learning-Based Comparison of Invasive and Non-Invasive Screening
DOI:
https://doi.org/10.59796/jcst.V16N1.2026.159Keywords:
artificial intelligence, diabetes, feature selection method, dimension reduction method, machine learning, non-blood test diabetes predictionAbstract
This study evaluated the performance of 24 models for diabetes prediction by using eight predictors: sex, heart disease, hypertension, smoking history, BMI (Body Mass Index), HbA1c level (Hemoglobin A1c), and blood glucose level obtained from an open data source, Kaggle. Data preparation involved curating and cleaning to ensure unbiased training and a balanced dataset before applying the dataset to machine learning training. The research examined data splitting ratios at 70/30, 80/20, and 90/10. The prediction task focused on the diabetes category: 0 (non-diabetes) and 1 (diabetes). The performance parameters indicated that the Ensemble Boosted Trees model, particularly with a 70/30 data splitting ratio, achieved the highest accuracy of 91.45%, precision of 91.29%, recall of 91.65%, and F1-score of 91.37%. Feature selection, including Chi-Square (c2) ANOVA, Kruskal-Wallis, and principal component analysis have been applied to reduce the complexity and dimensionality of the model, and it was found that the following parameters were significant for diabetes diagnosis: (1) HbA1c, (2) blood glucose, (3) BMI, and (4) age. The first two parameters are crucial for medical practitioners to determine whether a patient has diabetes; however, they are invasive and can only be collected from blood test results. Here, we also discuss the accuracy of the machine learning model in predicting diabetes without invasive predictors, namely, blood glucose and HbA1c. Our simplified model using age and BMI still yielded a reasonable accuracy of 74.65%, demonstrating the feasibility of non-blood test and non-invasive screening, especially in resource-limited settings, where age and BMI are key non-blood test predictors.
References
Alam, S., Hasan, M. K., Neaz, S., Hussain, N., Hossain, M. F., & Rahman, T. (2021). Diabetes mellitus: Insights from epidemiology, biochemistry, risk factors, diagnosis, complications, and comprehensive management. Diabetology, 2(2), 36-50. https://doi.org/10.3390/diabetology2020004
Anupongongarch, P., Kaewgun, T., O’Reilly, J. A., & Suraamornkul, S. (2022). Design and construction of a non-invasive blood glucose and heart rate meter by photoplethysmography. Journal of Current Science and Technology, 12(1), 89–101. https://doi.org/10.14456/jcst.2022.9
Bergman, M., Abdul-Ghani, M., Neves, J. S., Monteiro, M. P., Medina, J. L., Dorcely, B., & Buysschaert, M. (2020). Pitfalls of HbA1c in the diagnosis of diabetes. The Journal of Clinical Endocrinology & Metabolism, 105(8), 2803-2811. https://doi.org/10.1210/clinem/dgaa372
Boadu, A. A., Yeboah-Manu, M., Osei-Wusu, S., & Yeboah-Manu, D. (2024). Tuberculosis and diabetes mellitus: The complexity of the comorbid interactions. International Journal of Infectious Diseases, 146, Article 107140. https://doi.org/10.1016/j.ijid.2024.107140
Buzzetti, R., Zampetti, S., & Maddaloni, E. (2017). Adult-onset autoimmune diabetes: Current knowledge and implications for management. Nature Reviews Endocrinology, 13(11), 674-686. https://doi.org/10.1038/nrendo.2017.99
Calibo, M. B. T. (2024). Treatment of chronic and severe diabetes mellitus with ketoacidosis in a four-year-old intact female American Pit Bull Terrier. Asian Journal of Research in Animal and Veterinary Sciences, 7(2), 109-121. https://doi.org/10.9734/ajravs/2024/v7i2291
Carmichael, J., Fadavi, H., Ishibashi, F., Shore, A. C., & Tavakoli, M. (2021). Advances in screening, early diagnosis and accurate staging of diabetic neuropathy. Frontiers in Endocrinology, 12, Article 671257. https://doi.org/10.3389/fendo.2021.671257
Chapakiya, I., Traisuwan, A., Chumpong, S., & Chumpong, K. (2025). Follow-up period classification of type 2 diabetes patients using data mining techniques. Journal of Health Science and Medical Research, 43(2), Article e20241083. https://doi.org/10.31584/jhsmr.20241083
Dagliati, A., Marini, S., Sacchi, L., Cogni, G., Teliti, M., Tibollo, V., ... & Bellazzi, R. (2018). Machine learning methods to predict diabetes complications. Journal of Diabetes Science and Technology, 12(2), 295-302. https://doi.org/10.1177/1932296817706375
Davidson, K. W., Barry, M. J., Mangione, C. M., Cabana, M., Caughey, A. B., Davis, E. M., ... & US Preventive Services Task Force. (2021). Screening for prediabetes and type 2 diabetes: US preventive services task force recommendation statement. Jama, 326(8), 736-743. https://doi.org/10.1001/jama.2021.12531
Dritsas, E., & Trigka, M. (2022). Data-driven machine-learning methods for diabetes risk prediction. Sensors, 22(14), Article 5304. https://doi.org/10.3390/s22145304
Ghosh, P., Azam, S., Karim, A., Hassan, M., Roy, K., & Jonkman, M. (2021). A comparative study of different machine learning tools in detecting diabetes. Procedia Computer Science, 192, 467-477. https://doi.org/10.1016/j.procs.2021.08.048
Jalilian, H., Javanshir, E., Torkzadeh, L., Fehresti, S., Mir, N., Heidari‐Jamebozorgi, M., & Heydari, S. (2023). Prevalence of type 2 diabetes complications and its association with diet knowledge and skills and self‐care barriers in Tabriz, Iran: A cross‐sectional study. Health Science Reports, 6(2), Article e1096. https://doi.org/10.1002/hsr2.1096
Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432-439. https://doi.org/10.1016/j.icte.2021.02.004
Liu, H., & Wang, A., Hu, X., Kang, S., Hu, X., Mu, Y., Wang, Y., & Lyu, Z. (2025). The effects of glycated hemoglobin and body mass index on the relationship between the hemoglobin glycation index and the hypoglycemia risk: A moderated mediation analysis. Metabolism and Target Organ Damage, 5, Article 43. https://doi.org/10.20517/mtod.2025.74
Looareesuwan, P., Boonmanunt, S., Thammasudjarit, R., Siriyotha, S., Pattanaprateep, O., Lukkunaprasit, T., Nimitphong, H., Reutrakul, S., Attia, J., McKay, G., & Thakkinstian, A. (2023). Retinopathy prediction in type 2 diabetes: Time-varying Cox proportional hazards and machine learning models. Informatics in Medicine Unlocked, 40, Article 101285. https://doi.org/10.1016/j.imu.2023.101285
Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. Retrieved from https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset
Pechprasarn, S., Manavibool, L., Supmool, N., Vechpanich, N., & Meepadung, P. (2023). Predicting Parkinson’s Disease severity using telemonitoring data and machine learning models: A principal component analysis-based approach for remote healthcare services during the COVID-19 pandemic. Journal of Current Science and Technology, 13(2), 465–485. https://doi.org/10.59796/jcst.V13N2.2023.694
Pechprasarn, S., Srisaranon, N., & Yimluean, P. (2025). Optimizing diabetes prediction: An evaluation of machine learning models through strategic feature selection. Journal of Current Science and Technology, 15(1), Article 75. https://doi.org/10.59796/jcst.V15N1.2025.75
Pippitt, K., Li, M., & Gurgle, H. E. (2016). Diabetes mellitus: Screening and diagnosis. American Family Physician, 93(2), 103-109.
Poorani, K., Balakannan, S. P., & Karuppasamy, M. (2025). Mitigating data imbalance for robust diabetes diagnosis using machine learning and explainable artificial intelligence. Journal of Current Science and Technology, 15(3), Article 111. https://doi.org/10.59796/jcst.V15N3.2025.111
Prabhakar, P. K. (2024). Glucose to complications: Understanding secondary effects in diabetes mellitus. Sumatera Medical Journal, 7(2), 87-95. https://doi.org/10.32734/sumej.v7i2.15998
Qin, Y., Wu, J., Xiao, W., Wang, K., Huang, A., Liu, B., ... & Ren, Z. (2022). Machine learning models for data-driven prediction of diabetes by lifestyle type. International Journal of Environmental Research and Public Health, 19(22), Article 15027. https://doi.org/10.3390/ijerph192215027
Rani, K. J. (2020). Diabetes prediction using machine learning. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 6(4), 294-305. https://doi.org/10.32628/CSEIT206463
Sarkar, B. K., Akter, R., Das, J., Das, A., Modak, P., Halder, S., ... & Kundu, S. K. (2019). Diabetes mellitus: A comprehensive review. Journal of Pharmacognosy and Phytochemistry, 8(6), 2362-2371.
Sinsophonphap, T., & Thavornsawadi, K. (2022). The cut-off value of HbA1c for prediabetes and diabetes among obese children and adolescents. Vajira Medical Journal: Journal of Urban Medicine, 66(4), 299–310. https://doi.org/10.14456/vmj.2022.30
Skyler, J. S., Bakris, G. L., Bonifacio, E., Darsow, T., Eckel, R. H., Groop, L., ... & Ratner, R. E. (2017). Differentiation of diabetes by pathophysiology, natural history, and prognosis. Diabetes, 66(2), 241-255. https://doi.org/10.2337/db16-0806
Tiwari, D., & Aw, T. C. (2024). The 2024 American Diabetes association guidelines on standards of medical care in diabetes: Key takeaways for laboratory. Exploration of Endocrine and Metabolic Diseases, 1(4), 158-166. https://doi.org/10.37349/eemd.2024.00013
Zhang, J., Zhang, Z., Zhang, K., Ge, X., Sun, R., & Zhai, X. (2023). Early detection of type 2 diabetes risk: Limitations of current diagnostic criteria. Frontiers in Endocrinology, 14, Article 1260623. https://doi.org/10.3389/fendo.2023.1260623
Downloads
Published
How to Cite
Issue
Section
Categories
- Computing (Computer Science; Computer Engineering) > Artificial Intelligence (AI)
- Computing (Computer Science; Computer Engineering) > Bioinformatics
- Computing (Computer Science; Computer Engineering) > Data Science and Analytics
- Computing (Computer Science; Computer Engineering) > Machine Learning and Intelligent Systems
License
Copyright (c) 2025 Journal of Current Science and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.



