A Data-Driven Framework for Diabetes Prediction: Machine Learning-Based Comparison of Invasive and Non-Invasive Screening

Authors

  • Sasipatcha Hanmanop College of Biomedical Engineering, Rangsit University, Pathum Thani 12000, Thailand
  • Tatpol Jongsiri College of Biomedical Engineering, Rangsit University, Pathum Thani 12000, Thailand
  • Kittitat Waiprasit College of Biomedical Engineering, Rangsit University, Pathum Thani 12000, Thailand
  • Suejit Pechprasarn College of Biomedical Engineering, Rangsit University, Pathum Thani 12000, Thailand & Center of Excellence in AI and Supercomputing, Rangsit University, Pathum Thani 12000, Thailand https://orcid.org/0000-0001-9105-8627

DOI:

https://doi.org/10.59796/jcst.V16N1.2026.159

Keywords:

artificial intelligence, diabetes, feature selection method, dimension reduction method, machine learning, non-blood test diabetes prediction

Abstract

This study evaluated the performance of 24 models for diabetes prediction by using eight predictors: sex, heart disease, hypertension, smoking history, BMI (Body Mass Index), HbA1c level (Hemoglobin A1c), and blood glucose level obtained from an open data source, Kaggle. Data preparation involved curating and cleaning to ensure unbiased training and a balanced dataset before applying the dataset to machine learning training. The research examined data splitting ratios at 70/30, 80/20, and 90/10. The prediction task focused on the diabetes category: 0 (non-diabetes) and 1 (diabetes). The performance parameters indicated that the Ensemble Boosted Trees model, particularly with a 70/30 data splitting ratio, achieved the highest accuracy of 91.45%, precision of 91.29%, recall of 91.65%, and F1-score of 91.37%. Feature selection, including Chi-Square (c2) ANOVA, Kruskal-Wallis, and principal component analysis have been applied to reduce the complexity and dimensionality of the model, and it was found that the following parameters were significant for diabetes diagnosis: (1) HbA1c, (2) blood glucose, (3) BMI, and (4) age. The first two parameters are crucial for medical practitioners to determine whether a patient has diabetes; however, they are invasive and can only be collected from blood test results. Here, we also discuss the accuracy of the machine learning model in predicting diabetes without invasive predictors, namely, blood glucose and HbA1c. Our simplified model using age and BMI still yielded a reasonable accuracy of 74.65%, demonstrating the feasibility of non-blood test and non-invasive screening, especially in resource-limited settings, where age and BMI are key non-blood test predictors.

References

Alam, S., Hasan, M. K., Neaz, S., Hussain, N., Hossain, M. F., & Rahman, T. (2021). Diabetes mellitus: Insights from epidemiology, biochemistry, risk factors, diagnosis, complications, and comprehensive management. Diabetology, 2(2), 36-50. https://doi.org/10.3390/diabetology2020004

Anupongongarch, P., Kaewgun, T., O’Reilly, J. A., & Suraamornkul, S. (2022). Design and construction of a non-invasive blood glucose and heart rate meter by photoplethysmography. Journal of Current Science and Technology, 12(1), 89–101. https://doi.org/10.14456/jcst.2022.9

Bergman, M., Abdul-Ghani, M., Neves, J. S., Monteiro, M. P., Medina, J. L., Dorcely, B., & Buysschaert, M. (2020). Pitfalls of HbA1c in the diagnosis of diabetes. The Journal of Clinical Endocrinology & Metabolism, 105(8), 2803-2811. https://doi.org/10.1210/clinem/dgaa372

Boadu, A. A., Yeboah-Manu, M., Osei-Wusu, S., & Yeboah-Manu, D. (2024). Tuberculosis and diabetes mellitus: The complexity of the comorbid interactions. International Journal of Infectious Diseases, 146, Article 107140. https://doi.org/10.1016/j.ijid.2024.107140

Buzzetti, R., Zampetti, S., & Maddaloni, E. (2017). Adult-onset autoimmune diabetes: Current knowledge and implications for management. Nature Reviews Endocrinology, 13(11), 674-686. https://doi.org/10.1038/nrendo.2017.99

Calibo, M. B. T. (2024). Treatment of chronic and severe diabetes mellitus with ketoacidosis in a four-year-old intact female American Pit Bull Terrier. Asian Journal of Research in Animal and Veterinary Sciences, 7(2), 109-121. https://doi.org/10.9734/ajravs/2024/v7i2291

Carmichael, J., Fadavi, H., Ishibashi, F., Shore, A. C., & Tavakoli, M. (2021). Advances in screening, early diagnosis and accurate staging of diabetic neuropathy. Frontiers in Endocrinology, 12, Article 671257. https://doi.org/10.3389/fendo.2021.671257

Chapakiya, I., Traisuwan, A., Chumpong, S., & Chumpong, K. (2025). Follow-up period classification of type 2 diabetes patients using data mining techniques. Journal of Health Science and Medical Research, 43(2), Article e20241083. https://doi.org/10.31584/jhsmr.20241083

Dagliati, A., Marini, S., Sacchi, L., Cogni, G., Teliti, M., Tibollo, V., ... & Bellazzi, R. (2018). Machine learning methods to predict diabetes complications. Journal of Diabetes Science and Technology, 12(2), 295-302. https://doi.org/10.1177/1932296817706375

Davidson, K. W., Barry, M. J., Mangione, C. M., Cabana, M., Caughey, A. B., Davis, E. M., ... & US Preventive Services Task Force. (2021). Screening for prediabetes and type 2 diabetes: US preventive services task force recommendation statement. Jama, 326(8), 736-743. https://doi.org/10.1001/jama.2021.12531

Dritsas, E., & Trigka, M. (2022). Data-driven machine-learning methods for diabetes risk prediction. Sensors, 22(14), Article 5304. https://doi.org/10.3390/s22145304

Ghosh, P., Azam, S., Karim, A., Hassan, M., Roy, K., & Jonkman, M. (2021). A comparative study of different machine learning tools in detecting diabetes. Procedia Computer Science, 192, 467-477. https://doi.org/10.1016/j.procs.2021.08.048

Jalilian, H., Javanshir, E., Torkzadeh, L., Fehresti, S., Mir, N., Heidari‐Jamebozorgi, M., & Heydari, S. (2023). Prevalence of type 2 diabetes complications and its association with diet knowledge and skills and self‐care barriers in Tabriz, Iran: A cross‐sectional study. Health Science Reports, 6(2), Article e1096. https://doi.org/10.1002/hsr2.1096

Khanam, J. J., & Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4), 432-439. https://doi.org/10.1016/j.icte.2021.02.004

Liu, H., & Wang, A., Hu, X., Kang, S., Hu, X., Mu, Y., Wang, Y., & Lyu, Z. (2025). The effects of glycated hemoglobin and body mass index on the relationship between the hemoglobin glycation index and the hypoglycemia risk: A moderated mediation analysis. Metabolism and Target Organ Damage, 5, Article 43. https://doi.org/10.20517/mtod.2025.74

Looareesuwan, P., Boonmanunt, S., Thammasudjarit, R., Siriyotha, S., Pattanaprateep, O., Lukkunaprasit, T., Nimitphong, H., Reutrakul, S., Attia, J., McKay, G., & Thakkinstian, A. (2023). Retinopathy prediction in type 2 diabetes: Time-varying Cox proportional hazards and machine learning models. Informatics in Medicine Unlocked, 40, Article 101285. https://doi.org/10.1016/j.imu.2023.101285

Mustafa, M. (2023). A Comprehensive Dataset for Predicting Diabetes with Medical & Demographic Data. Retrieved from https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset

Pechprasarn, S., Manavibool, L., Supmool, N., Vechpanich, N., & Meepadung, P. (2023). Predicting Parkinson’s Disease severity using telemonitoring data and machine learning models: A principal component analysis-based approach for remote healthcare services during the COVID-19 pandemic. Journal of Current Science and Technology, 13(2), 465–485. https://doi.org/10.59796/jcst.V13N2.2023.694

Pechprasarn, S., Srisaranon, N., & Yimluean, P. (2025). Optimizing diabetes prediction: An evaluation of machine learning models through strategic feature selection. Journal of Current Science and Technology, 15(1), Article 75. https://doi.org/10.59796/jcst.V15N1.2025.75

Pippitt, K., Li, M., & Gurgle, H. E. (2016). Diabetes mellitus: Screening and diagnosis. American Family Physician, 93(2), 103-109.

Poorani, K., Balakannan, S. P., & Karuppasamy, M. (2025). Mitigating data imbalance for robust diabetes diagnosis using machine learning and explainable artificial intelligence. Journal of Current Science and Technology, 15(3), Article 111. https://doi.org/10.59796/jcst.V15N3.2025.111

Prabhakar, P. K. (2024). Glucose to complications: Understanding secondary effects in diabetes mellitus. Sumatera Medical Journal, 7(2), 87-95. https://doi.org/10.32734/sumej.v7i2.15998

Qin, Y., Wu, J., Xiao, W., Wang, K., Huang, A., Liu, B., ... & Ren, Z. (2022). Machine learning models for data-driven prediction of diabetes by lifestyle type. International Journal of Environmental Research and Public Health, 19(22), Article 15027. https://doi.org/10.3390/ijerph192215027

Rani, K. J. (2020). Diabetes prediction using machine learning. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 6(4), 294-305. https://doi.org/10.32628/CSEIT206463

Sarkar, B. K., Akter, R., Das, J., Das, A., Modak, P., Halder, S., ... & Kundu, S. K. (2019). Diabetes mellitus: A comprehensive review. Journal of Pharmacognosy and Phytochemistry, 8(6), 2362-2371.

Sinsophonphap, T., & Thavornsawadi, K. (2022). The cut-off value of HbA1c for prediabetes and diabetes among obese children and adolescents. Vajira Medical Journal: Journal of Urban Medicine, 66(4), 299–310. https://doi.org/10.14456/vmj.2022.30

Skyler, J. S., Bakris, G. L., Bonifacio, E., Darsow, T., Eckel, R. H., Groop, L., ... & Ratner, R. E. (2017). Differentiation of diabetes by pathophysiology, natural history, and prognosis. Diabetes, 66(2), 241-255. https://doi.org/10.2337/db16-0806

Tiwari, D., & Aw, T. C. (2024). The 2024 American Diabetes association guidelines on standards of medical care in diabetes: Key takeaways for laboratory. Exploration of Endocrine and Metabolic Diseases, 1(4), 158-166. https://doi.org/10.37349/eemd.2024.00013

Zhang, J., Zhang, Z., Zhang, K., Ge, X., Sun, R., & Zhai, X. (2023). Early detection of type 2 diabetes risk: Limitations of current diagnostic criteria. Frontiers in Endocrinology, 14, Article 1260623. https://doi.org/10.3389/fendo.2023.1260623

Downloads

Published

2025-12-25

How to Cite

Hanmanop, S., Jongsiri, T., Waiprasit, K., & Pechprasarn, S. (2025). A Data-Driven Framework for Diabetes Prediction: Machine Learning-Based Comparison of Invasive and Non-Invasive Screening. Journal of Current Science and Technology, 16(1), 159. https://doi.org/10.59796/jcst.V16N1.2026.159