Data redundancy removal using K-MAD based self-tuning spectral clustering and CKD prediction using ML techniques


  • P. Pradeepa Department of Computer Applications, Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu 629180, India
  • M. K. Jeyakumar Department of Computer Applications, Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu 629180, India


ANN, chronic kidney disease, DNN, KNN, machine learning algorithm, redundant self-tuning spectral clustering, SVM


Chronic kidney disease (CKD) is one of the most complicated disorders, and it is found by gradual degradation of kidney function. People suffer to die several long-term complications like high blood pressure and heart and bone diseases. Hence, various automated early detection methods were developed to identify the disease at its early stage. Still, in numerous existing methods, the prediction level is inaccurate, so patients with low signs of CKD are found severe and undergo CKD treatments. This is because of the dataset's length and redundancy. To overcome these concerns, this paper focuses on increasing the prediction accuracy of CKD, utilizing an effective data mining approach. Therefore, to minimize the redundancy problem and high data dimension, this paper implemented the K-mad based self-tuning spectral clustering (KSSC) technique. The algorithm of self-tuning was designed to arrange data according to requirements and eliminate unnecessary data, resulting in a smaller data dimension. Various machine learning (ML) algorithms were used to verify the dimension-reduced data of Random Forest (RF), Artificial Neural Network (ANN), Deep Neural Network (DNN), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) classifier. Then the proposed technique was tested using various performance metrics in a Python environment, such as precision, f1_score, sensitivity, accuracy, specificity, and recall. The comparison study reveals that KNN and SVM deliver superior CKD predictions using a clustering method and attained 96% accuracy. Thus, the proposed KSSC shows essential information from healthcare centres and medical patient data, which is most helpful in assisting physicians in enhancing the accuracy of CKD diagnosis prior to a severe condition.


Akben, S. B. (2018). Early-stage chronic kidney disease diagnosis by applying data mining methods to urinalysis, blood analysis and disease history. IRBM, 39(5), 353-358.

Aljarah, I., Mafarja, M., Heidari, A. A., Faris, H., & Mirjalili, S. (2020). Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowledge and Information Systems, 62(2), 507-539.

Alloghani, M., Al-Jumeily, D., Hussain, A., Liatsis, P., & Aljaaf, A. J. (2020). Performance-based prediction of chronic kidney disease using machine learning for high-risk cardiovascular disease patients. In Nature-inspired computation in data mining and machine learning (pp. 187-206). Springer, Cham.

Almarashi, A., Alghamdi, M., & Mechai, I. (2018). A new mathematical model for diagnosing chronic diseases (kidney failure) using ANN. Cogent Mathematics & Statistics, 5(1), 1559457.

Almasoud, M., & Ward, T. E. (2019). Detection of chronic kidney disease using machine learning algorithms with least number of predictors. International Journal of Soft Computing and Its Applications, 10(8). DOI: 10.14569/IJACSA.2019.0100813

Almustafa, K. M. (2021). Prediction of chronic kidney disease using different classification algorithms. Informatics in Medicine Unlocked, 24, 100631.

Alshammari, M., Stavrakakis, J., & Takatsuka, M. (2021). Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recognition, 114, 107869.

Bradley, R., Tagkopoulos, I., Kim, M., Kokkinos, Y., Panagiotakos, T., Kennedy, J., ... & Elliott, J. (2019). Predicting early risk of chronic kidney disease in cats using routine clinical laboratory tests and machine learning. Journal of veterinary internal medicine, 33(6), 2644-2656.

Cheng, D., Huang, J., Zhang, S., Zhang, X., & Luo, X. (2021). A novel approximate spectral clustering algorithm with dense cores and density peaks. IEEE transactions on systems, man, and cybernetics: systems, 52(4), 2348-2360. DOI: 10.1109/TSMC.2021.3049490

Elhoseny, M., Shankar, K., & Uthayakumar, J. (2019). Intelligent diagnostic prediction and classification system for chronic kidney disease. Scientific reports, 9(1), 1-14.

Guo, Y., Yu, H., Chen, D., & Zhao, Y. Y. (2020). Machine learning distilled metabolite biomarkers for early stage renal injury. Metabolomics, 16(1), 1-10.

Hegde, S., & Mundada, M. R. (2020). Early prediction of chronic disease using an efficient machine learning algorithm through adaptive probabilistic divergence based feature selection approach. International Journal of Pervasive Computing and Communications, 17(1), 20-36.

Iliyas, I. I., Saidu, I. R., Dauda, A. B., & Tasiu, S. (2020). Prediction of Chronic Kidney Disease Using Deep Neural Network. arXiv preprint arXiv:2012.12089.

Karthick, S. (2017). Semi supervised hierarchy forest clustering and KNN based metric learning technique for machine learning system. Journal of Advanced Research in Dynamical and Control Systems, 9(1), 2679-2690.

Khan, B., Naseem, R., Muhammad, F., Abbas, G., & Kim, S. (2020). An empirical evaluation of machine learning techniques for chronic kidney disease prophecy. IEEE Access, 8, 55012-55022. DOI: 10.1109/ACCESS.2020.2981689

Lakshmanaprabu, S. K., Mohanty, S. N., Krishnamoorthy, S., Uthayakumar, J., & Shankar, K. (2019). Online clinical decision support system using optimal deep neural networks. Applied Soft Computing, 81, 105487.

Lambert, J. R., & Perumal, E. (2021). Optimal feature selection methods for chronic kidney disease classification using intelligent optimization algorithms. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(9), 2886-2898.

Lim, C. C., He, F., Li, J., Tham, Y. C., Tan, C. S., Cheng, C. Y., ... & Sabanayagam, C. (2021). Application of machine learning techniques to understand ethnic differences and risk factors for incident chronic kidney disease in Asians. BMJ Open Diabetes Research and Care, 9(2), e002364. DOI: 10.1136/bmjdrc-2021-002364

Nusinovici, S., Tham, Y. C., Yan, M. Y. C., Ting, D. S. W., Li, J., Sabanayagam, C., ... & Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases. Journal of clinical epidemiology, 122, 56-69.

Onan, A. (2018a). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 28-47. DOI: 10.1177/0165551516677911

Onan, A. (2018b). Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine, 2018.

Onan, A. (2019a). Consensus clustering-based undersampling approach to imbalanced learning. Scientific Programming, 2019.

Onan, A. (2019b). Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access, 7, 145614-145633. DOI: 10.1109/ACCESS.2019.2945911

Onan, A. (2019c). Topic-enriched word embeddings for sarcasm identification. In Computer science on-line conference (pp. 293-304). Springer, Cham.

Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. DOI: 10.1002/cae.22179

Onan, A. (2021a). Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572-589. DOI: 10.1002/cae.22253

Onan, A. (2021b). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, 33(23), e5909. DOI: 10.1002/cpe.5909

Onan, A. (2022). Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences, 34(5), 2098-2117.

Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38.DOI: 10.1177/0165551515613226

Onan, A., & Toçoğlu, M. A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9, 7701-7722. DOI: 10.1109/ACCESS.2021.3049734

Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232-247.

Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814-833.

Parmar, M. D., Pang, W., Hao, D., Jiang, J., Liupu, W., Wang, L., & Zhou, Y. (2019b). FREDPC: A feasible residual error-based density peak clustering algorithm with the fragment merging strategy. IEEE Access, 7, 89789-89804.

Parmar, M., Wang, D., Zhang, X., Tan, A. H., Miao, C., Jiang, J., & Zhou, Y. (2019a). REDPC: A residual error-based density peak clustering algorithm. Neurocomputing, 348, 82-96.

Rady, E. H. A., & Anwar, A. S. (2019). Prediction of kidney disease stages using data mining algorithms. Informatics in Medicine Unlocked, 15, 100178.

Ravindra, B. V., Sriraam, N., & Geetha, M. J. I. J. E. T. (2018). Classification of non-chronic and chronic kidney disease using SVM neural networks. International Journal of Engineering & Technology, 7(1), 191-194. DOI: 10.14419/ijet.v7i1.3.10669

Scholar, P. G. (2018). Chronic kidney disease prediction using machine learning. International Journal of Computer Science and Information Security (IJCSIS), 16(4). DOI: 10.35940/ijeat.A2213.109119

Senan, E. M., Al-Adhaileh, M. H., Alsaade, F. W., Aldhyani, T. H., Alqarni, A. A., Alsharif, N., ... & Alzahrani, M. Y. (2021). Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. Journal of Healthcare Engineering, 2021.

Shetty, A. R., Ahmed, F. B., & Naik, V. M. (2019). CKD prediction using data mining technique as SVM and KNN With pycharm. International Research Journal of Engineering and Technology (IRJET), 6(5), 4399-4405.

Sobrinho, A., Queiroz, A. C. D. S., Da Silva, L. D., Costa, E. D. B., Pinheiro, M. E., & Perkusich, A. (2020). Computer-aided diagnosis of chronic kidney disease in developing countries: A comparative analysis of machine learning techniques. IEEE Access, 8, 25407-25419. DOI: 10.1109/ACCESS.2020.2971208

Thongprayoon, C., Kaewput, W., Choudhury, A., Hansrivijit, P., Mao, M. A., & Cheungpasitporn, W. (2021). Is It Time for Machine Learning Algorithms to Predict the Risk of Kidney Failure in Patients with Chronic Kidney Disease?. Journal of Clinical Medicine, 10(5), 1121.

Wang, W., Chakraborty, G., & Chakraborty, B. (2020). Predicting the risk of chronic kidney disease (ckd) using machine learning algorithm. Applied Sciences, 11(1), 202.

Wang, Y., Ding, S., Wang, L., & Ding, L. (2021). An improved density-based adaptive p-spectral clustering algorithm. International Journal of Machine Learning and Cybernetics, 12(6), 1571-1582.

Wen, G. (2020). Robust self-tuning spectral clustering. Neurocomputing, 391, 243-248.

Xiao, J., Ding, R., Xu, X., Guan, H., Feng, X., Sun, T., ... & Ye, Z. (2019). Comparison and development of machine learning tools in the prediction of chronic kidney disease progression. Journal of translational medicine, 17(1), 1-13. DOI:10.1186/s12967-019-1860-0

Zelnik-manor, L., & Perona, P. (2004). Self-Tuning Spectral Clustering. Advances in Neural Information Processing Systems, 17, 1-8.

Zhang, X., Li, J., & Yu, H. (2011). Local density adaptive similarity measurement for spectral clustering. Pattern Recognition Letters, 32(2), 352-358.

Zhang, Y., Yang, Y., Li, T., & Fujita, H. (2019). A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowledge-Based Systems, 163, 776-786.




How to Cite

P. Pradeepa, & M. K. Jeyakumar. (2023). Data redundancy removal using K-MAD based self-tuning spectral clustering and CKD prediction using ML techniques. Journal of Current Science and Technology, 12(3), 517–537. Retrieved from



Research Article