Data redundancy removal using K-MAD based self-tuning spectral clustering and CKD prediction using ML techniques

Authors

  • P. Pradeepa Department of Computer Applications, Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu 629180, India
  • M. K. Jeyakumar Department of Computer Applications, Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu 629180, India

Keywords:

ANN, chronic kidney disease, DNN, KNN, machine learning algorithm, redundant self-tuning spectral clustering, SVM

Abstract

Chronic kidney disease (CKD) is one of the most complicated disorders, and it is found by gradual degradation of kidney function. People suffer to die several long-term complications like high blood pressure and heart and bone diseases. Hence, various automated early detection methods were developed to identify the disease at its early stage. Still, in numerous existing methods, the prediction level is inaccurate, so patients with low signs of CKD are found severe and undergo CKD treatments. This is because of the dataset's length and redundancy. To overcome these concerns, this paper focuses on increasing the prediction accuracy of CKD, utilizing an effective data mining approach. Therefore, to minimize the redundancy problem and high data dimension, this paper implemented the K-mad based self-tuning spectral clustering (KSSC) technique. The algorithm of self-tuning was designed to arrange data according to requirements and eliminate unnecessary data, resulting in a smaller data dimension. Various machine learning (ML) algorithms were used to verify the dimension-reduced data of Random Forest (RF), Artificial Neural Network (ANN), Deep Neural Network (DNN), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) classifier. Then the proposed technique was tested using various performance metrics in a Python environment, such as precision, f1_score, sensitivity, accuracy, specificity, and recall. The comparison study reveals that KNN and SVM deliver superior CKD predictions using a clustering method and attained 96% accuracy. Thus, the proposed KSSC shows essential information from healthcare centres and medical patient data, which is most helpful in assisting physicians in enhancing the accuracy of CKD diagnosis prior to a severe condition.

References

Akben, S. B. (2018). Early-stage chronic kidney disease diagnosis by applying data mining methods to urinalysis, blood analysis and disease history. IRBM, 39(5), 353-358. doi.org/10.1016/j.irbm.2018.09.004

Aljarah, I., Mafarja, M., Heidari, A. A., Faris, H., & Mirjalili, S. (2020). Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowledge and Information Systems, 62(2), 507-539. doi.org/10.1007/s10115-019-01358

Alloghani, M., Al-Jumeily, D., Hussain, A., Liatsis, P., & Aljaaf, A. J. (2020). Performance-based prediction of chronic kidney disease using machine learning for high-risk cardiovascular disease patients. In Nature-inspired computation in data mining and machine learning (pp. 187-206). Springer, Cham. doi.org/10.1007/978-3-030-28553-1_9

Almarashi, A., Alghamdi, M., & Mechai, I. (2018). A new mathematical model for diagnosing chronic diseases (kidney failure) using ANN. Cogent Mathematics & Statistics, 5(1), 1559457. doi.org/10.1080/23311835.2018.1559457

Almasoud, M., & Ward, T. E. (2019). Detection of chronic kidney disease using machine learning algorithms with least number of predictors. International Journal of Soft Computing and Its Applications, 10(8). DOI: 10.14569/IJACSA.2019.0100813

Almustafa, K. M. (2021). Prediction of chronic kidney disease using different classification algorithms. Informatics in Medicine Unlocked, 24, 100631. https://doi.org/10.1016/j.imu.2021.100631

Alshammari, M., Stavrakakis, J., & Takatsuka, M. (2021). Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recognition, 114, 107869. https://doi.org/10.1016/j.patcog.2021.107869

Bradley, R., Tagkopoulos, I., Kim, M., Kokkinos, Y., Panagiotakos, T., Kennedy, J., ... & Elliott, J. (2019). Predicting early risk of chronic kidney disease in cats using routine clinical laboratory tests and machine learning. Journal of veterinary internal medicine, 33(6), 2644-2656. https://doi.org/10.1111/jvim.15623

Cheng, D., Huang, J., Zhang, S., Zhang, X., & Luo, X. (2021). A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE transactions on systems, man, and cybernetics: systems, 52(4), 2348-2360. DOI: 10.1109/TSMC.2021.3049490

Elhoseny, M., Shankar, K., & Uthayakumar, J. (2019). Intelligent diagnostic prediction and classification system for chronic kidney disease. Scientific reports, 9(1), 1-14. doi.org/10.1038/s41598-019-46074-2

Guo, Y., Yu, H., Chen, D., & Zhao, Y. Y. (2020). Machine learning distilled metabolite biomarkers for early stage renal injury. Metabolomics, 16(1), 1-10. https://doi.org/10.1007/s11306-019-1624-0

Hegde, S., & Mundada, M. R. (2020). Early prediction of chronic disease using an efficient machine learning algorithm through adaptive probabilistic divergence based feature selection approach. International Journal of Pervasive Computing and Communications, 17(1), 20-36. https://doi.org/10.1108/IJPCC-04-2020-0018

Iliyas, I. I., Saidu, I. R., Dauda, A. B., & Tasiu, S. (2020). Prediction of Chronic Kidney Disease Using Deep Neural Network. arXiv preprint arXiv:2012.12089. https://doi.org/10.48550/arXiv.2012.12089

Karthick, S. (2017). Semi supervised hierarchy forest clustering and KNN based metric learning technique for machine learning system. Journal of Advanced Research in Dynamical and Control Systems, 9(1), 2679-2690.

Khan, B., Naseem, R., Muhammad, F., Abbas, G., & Kim, S. (2020). An empirical evaluation of machine learning techniques for chronic kidney disease prophecy. IEEE Access, 8, 55012-55022. DOI: 10.1109/ACCESS.2020.2981689

Lakshmanaprabu, S. K., Mohanty, S. N., Krishnamoorthy, S., Uthayakumar, J., & Shankar, K. (2019). Online clinical decision support system using optimal deep neural networks. Applied Soft Computing, 81, 105487. https://doi.org/10.1016/j.asoc.2019.105487

Lambert, J. R., & Perumal, E. (2021). Optimal feature selection methods for chronic kidney disease classification using intelligent optimization algorithms. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(9), 2886-2898. https://doi.org/10.2174/2666255813999200818131835

Lim, C. C., He, F., Li, J., Tham, Y. C., Tan, C. S., Cheng, C. Y., ... & Sabanayagam, C. (2021). Application of machine learning techniques to understand ethnic differences and risk factors for incident chronic kidney disease in Asians. BMJ Open Diabetes Research and Care, 9(2), e002364. DOI: 10.1136/bmjdrc-2021-002364

Nusinovici, S., Tham, Y. C., Yan, M. Y. C., Ting, D. S. W., Li, J., Sabanayagam, C., ... & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of clinical epidemiology, 122, 56-69. https://doi.org/10.1016/j.jclinepi.2020.03.002

Onan, A. (2018a). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 28-47. DOI: 10.1177/0165551516677911

Onan, A. (2018b). Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine, 2018. https://doi.org/10.1155/2018/2497471

Onan, A. (2019a). Consensus clustering-based undersampling approach to imbalanced learning. Scientific Programming, 2019. doi.org/10.1155/2019/5901087

Onan, A. (2019b). Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access, 7, 145614-145633. DOI: 10.1109/ACCESS.2019.2945911

Onan, A. (2019c). Topic-enriched word embeddings for sarcasm identification. In Computer science on-line conference (pp. 293-304). Springer, Cham. https://doi.org/10.1007/978-3-030-19807-7_29

Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. DOI: 10.1002/cae.22179

Onan, A. (2021a). Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572-589. DOI: 10.1002/cae.22253

Onan, A. (2021b). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, 33(23), e5909. DOI: 10.1002/cpe.5909

Onan, A. (2022). Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences, 34(5), 2098-2117. doi.org/10.1016/j.jksuci.2022.02.025

Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38.DOI: 10.1177/0165551515613226

Onan, A., & Toçoğlu, M. A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9, 7701-7722. DOI: 10.1109/ACCESS.2021.3049734

Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232-247. https://doi.org/10.1016/j.eswa.2016.03.045

Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814-833. https://doi.org/10.1016/j.ipm.2017.02.008

Parmar, M. D., Pang, W., Hao, D., Jiang, J., Liupu, W., Wang, L., & Zhou, Y. (2019b). FREDPC: A feasible residual error-based density peak clustering algorithm with the fragment merging strategy. IEEE Access, 7, 89789-89804. https://doi.org/10.1109/ACCESS.2019.2926579

Parmar, M., Wang, D., Zhang, X., Tan, A. H., Miao, C., Jiang, J., & Zhou, Y. (2019a). REDPC: A residual error-based density peak clustering algorithm. Neurocomputing, 348, 82-96. https://doi.org/10.1016/j.neucom.2018.06.087

Rady, E. H. A., & Anwar, A. S. (2019). Prediction of kidney disease stages using data mining algorithms. Informatics in Medicine Unlocked, 15, 100178. https://doi.org/10.1016/j.imu.2019.100178

Ravindra, B. V., Sriraam, N., & Geetha, M. J. I. J. E. T. (2018). Classification of non-chronic and chronic kidney disease using SVM neural networks. International Journal of Engineering & Technology, 7(1), 191-194. DOI: 10.14419/ijet.v7i1.3.10669

Scholar, P. G. (2018). Chronic kidney disease prediction using machine learning. International Journal of Computer Science and Information Security (IJCSIS), 16(4). DOI: 10.35940/ijeat.A2213.109119

Senan, E. M., Al-Adhaileh, M. H., Alsaade, F. W., Aldhyani, T. H., Alqarni, A. A., Alsharif, N., ... & Alzahrani, M. Y. (2021). Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. Journal of Healthcare Engineering, 2021. doi.org/10.1155/2021/1004767

Shetty, A. R., Ahmed, F. B., & Naik, V. M. (2019). CKD prediction using data mining technique as SVM and KNN With pycharm. International Research Journal of Engineering and Technology (IRJET), 6(5), 4399-4405.

Sobrinho, A., Queiroz, A. C. D. S., Da Silva, L. D., Costa, E. D. B., Pinheiro, M. E., & Perkusich, A. (2020). Computer-aided diagnosis of chronic kidney disease in developing countries: A comparative analysis of machine learning techniques. IEEE Access, 8, 25407-25419. DOI: 10.1109/ACCESS.2020.2971208

Thongprayoon, C., Kaewput, W., Choudhury, A., Hansrivijit, P., Mao, M. A., & Cheungpasitporn, W. (2021). Is It Time for Machine Learning Algorithms to Predict the Risk of Kidney Failure in Patients with Chronic Kidney Disease?. Journal of Clinical Medicine, 10(5), 1121. https://doi.org/10.3390/jcm10051121

Wang, W., Chakraborty, G., & Chakraborty, B. (2020). Predicting the risk of chronic kidney disease (ckd) using machine learning algorithm. Applied Sciences, 11(1), 202. https://doi.org/10.3390/app11010202

Wang, Y., Ding, S., Wang, L., & Ding, L. (2021). An improved density-based adaptive p-spectral clustering algorithm. International Journal of Machine Learning and Cybernetics, 12(6), 1571-1582. https://doi.org/10.1007/s13042-020-01236-x

Wen, G. (2020). Robust self-tuning spectral clustering. Neurocomputing, 391, 243-248. https://doi.org/10.1016/j.neucom.2018.11.105

Xiao, J., Ding, R., Xu, X., Guan, H., Feng, X., Sun, T., ... & Ye, Z. (2019). Comparison and development of machine learning tools in the prediction of chronic kidney disease progression. Journal of translational medicine, 17(1), 1-13. DOI:10.1186/s12967-019-1860-0

Zelnik-manor, L., & Perona, P. (2004). Self-Tuning Spectral Clustering. Advances in Neural Information Processing Systems, 17, 1-8. https://proceedings.neurips.cc/paper/2004/file/40173ea48d9567f1f393b20c855bb40b-Paper.pdf

Zhang, X., Li, J., & Yu, H. (2011). Local density adaptive similarity measurement for spectral clustering. Pattern Recognition Letters, 32(2), 352-358. https://doi.org/10.1016/j.patrec.2010.09.014

Zhang, Y., Yang, Y., Li, T., & Fujita, H. (2019). A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowledge-Based Systems, 163, 776-786. doi.org/10.1016/j.knosys.2018.10.001

Downloads

Published

2023-02-12

How to Cite

P. Pradeepa, & M. K. Jeyakumar. (2023). Data redundancy removal using K-MAD based self-tuning spectral clustering and CKD prediction using ML techniques. Journal of Current Science and Technology, 12(3), 517–537. Retrieved from https://ph04.tci-thaijo.org/index.php/JCST/article/view/291

Issue

Section

Research Article