Optimizing Lung Cancer Diagnosis with Machine Learning and Feature Selection Methods
DOI:
https://doi.org/10.59796/jcst.V14N3.2024.55Keywords:
Lung Cancer, machine learning, smart diagnosis tool, artificial intelligence, feature selection, classificationAbstract
Lung cancer is a prevalent disease, with nearly 238,000 new cases diagnosed in 2023. This study utilizes clinical predictors from a Kaggle dataset containing 309 observations across 15 variables to aid in lung cancer diagnosis. The variables include swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, chronic disease, chest pain, coughing, fatigue, smoking, age, and shortness of breath. The research aims to develop and compare various supervised machine learning models for classifying and predicting lung cancer, while also identifying key clinical tests and parameters using unsupervised statistical models. The dataset was divided into training and test sets, balanced, and preprocessed for unbiased training. Feature selection and machine learning models were applied to identify crucial predictors. The study explored tree models, logistic regression, Naïve Bayes, support vector machine (SVM), ensemble, neural network, and kernel models. Among these, the linear SVM achieved the highest accuracy of 93.75% with 5-fold cross-validation. However, it showed overfitting, with a lower test accuracy of 82.55%. The Gaussian Naïve Bayes model emerged as the optimal choice, providing consistent performance between validation and test cases. It achieved the highest cross-validation classification accuracy of 82.81% using only 9 variables: swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, and chronic disease. This model allows for effective training with fewer predictors without compromising classification
References
Arroliga, A. C., & Matthay, R. A. (1993). The role of bronchoscopy in lung cancer. Clinics in Chest Medicine, 14(1), 87-98. https://doi.org/10.1016/S0272-5231(21)01150-3
Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: focus on clinicians. Journal of Medical Internet Research, 22(6), Article e15154. https://doi.org/10.2196/15154
Bukhari, A., Kumar, G., Rajsheker, R., & Markert, R. (2017). Timeliness of Lung Cancer Diagnosis and Treatment, Federal Practitioner, 34(1), 24S-29S. http://www.ncbi.nlm.nih.gov/pmc/articles/pmc6375422/
Cao, Z. (2017, May 13-14). Development and Application of Artificial Intelligence [Conference presentation], Proceedings of the 2nd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2017), Dalian, China. https://doi.org/10.2991/icmeit-17.2017.79
Cegla, P., Bos-Liedke, A., Burchardt, E., Konstanty, E., Piotrowski, A., Kozak, M., & Cholewinski, W. (2023). Diagnosis and treatment of lung cancer using nuclear medicine techniques—current state of the art. Nuclear Medicine Review, 26, 77-84. https://doi.org/10.5603/NMR.2023.0010
Chiu, H. Y., Chao, H. S., & Chen, Y. M. (2022). Application of Artificial Intelligence in Lung Cancer. Cancers, 14(6), 1-17. https://doi.org/10.3390/cancers14061370
Gould, M. K., Huang, B. Z., Tammemagi, M. C., Kinar, Y., & Shiff, R. (2021). Machine Learning for Early Lung Cancer. Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine, 204(4), 445-453. https://doi.org/10.1164/rccm.202007-2791OC
Karabulut, E. M., Özel, S. A., & İbrikçi, T., (2012). A comparative study on the effect of feature selection on classification accuracy, Procedia Technology, 1(2012), 323-327. https://doi.org/10.1016/j.protcy.2012.02.068
Ketkomol, P., Songsak, T., Jongrungruangchok, S., Madaka, F., & Pradubyat, N. (2024). The Effect of 1'-acetoxychavicol Acetate on A549 Human Non-small Cell Lung Cancer. Journal of Current Science and Technology, 14(2)., Article 43. https://doi.org/10.59796/jcst.V14N2.2024.43
Kim, J., Lee, H., & Huang, B. W. (2022). Lung Cancer: Diagnosis, Treatment, Principles, and Screening. Clinical Presentation and Diagnosis, 105(5), 1-2. https://www.binasss.sa.cr/mayo/2.pdf
Lakshmanaprabu, S., Mohanty, S., Shankar, K., Arunkumar, N., & Ramírez-González, G. (2019). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374-382. https://doi.org/10.1016/j.future.2018.10.009
Lareau, S., Slator, C., & Smyth, R. (2021). Lung Cancer. American Journal of Respiratory and Critical Care Medicine, 204(12), 21-22. https://doi.org/10.1164/rccm.20411P21
Li, C., Lei, S., Ding, L., Xu, Y., Wu, X., Wang, H., ... & Li, L. (2023). Global burden and trends of lung cancer incidence and mortality, Chinese Medical Journal, 136(13), 1583-1590. https://doi.org/10.1097/CM9.0000000000002529
Liu, S., Liu, S., Zhang, C., Yu, H., Liu, X., Hu, Y., ... & Fu, Q. (2020). Exploratory Study of a CT Radiomics Model for the Classification of Small Cell Lung Cancer and Non-small-Cell Lung Cancer. Frontiers in Oncology, 10, Article 1268, 1-11. https://doi.org/10.3389/fonc.2020.01268
Nooreldeen, R., & Bach, H. (2021). Current and Future Development in Lung Cancer Diagnosis. International Journal of Molecular Sciences, 22(16), Article 8661, 1-18. https://doi.org/10.3390/ijms22168661
Pacurari, A. C., Bhattarai, S., Muhammad, A., Avram, C., Mederle, A. O., Rosca, O., ... & Mavrea, A. (2023). Diagnostic Accuracy of Machine Learning AI Architectures in Detection and Classification of Lung Cancer: A Systematic Review. Diagnostics, 13(13), Article 2145. https://doi.org/10.3390/diagnostics13132145
Pechprasarn, S., Manavibool, L., Supmool, N., Vechpanich, N., and Meepadung, P. (2023a). Predicting Parkinson's Disease Severity using Telemonitoring Data and Machine Learning Models: A Principal Component Analysis-based Approach for Remote Healthcare Services during COVID-19 Pandemic. Journal of Current Science and Technology,13(2), 465-485. https://doi.org/10.59796/jcst.V13N2.2023.694465
Pechprasarn, S., Wattanapermpool, O., Warunlawan, M., Homsud, P., & Akarajarasroj, T.(2023b). Identification of Important Factors in the Diagnosis of Breast Cancer Cells Using Machine Learning Models and Principal Component Analysis. Journal of Current Science and Technology,13(3), 642-656. https://doi.org/10.59796/jcst.V13N3.2023.700
Pereira, T., Freitas, C., Costa, J. L., Morgado, J., Silva, F., Negrão, E., ... & Oliveira, H. P. (2020). Comprehensive Perspective for Lung Cancer Characterisation Based on AI Solutions Using CT Images. Journal of Clinical Medicine, 10(1), Article 118. https://doi.org/10.3390/jcm10010118
Rana, M., & Bhushan, M. (2023). Machine learning and deep learning approach for medical image analysis: diagnosis to detection. Multimedia Tools and Applications, 82(17), 26731-26769.. https://doi:10.1007/s11042-022-14305-w
Roland, M., & Rudd, R.M. (1998). Genetics and Pulmonary Medicine, Somatic mutation in development of lung cancer. Thorax, 53, 979-983. https://doi.org/10.1136/thx.53.11.979
Ruano-Raviña, A., Provencio, M., Calvo de Juan, V., Carcereny, E, Moran, T, Rodriguez-Abreu, D., …, Cerezo, S. (2020). Lung cancer symptoms at diagnosis: results of a nationwide registry study. ESMO Open, 5(6), Article e001021. https://doi.org/10.1136/esmoopen-2020-001021.
Sankar, V., Kothai, R., Vanisri, N., Akilandeswari, S., & Anandharaj, G., (2023). Lung Cancer, A Review, International Journal of Health Sciences and Research, 13(10), 307-315. https://doi.org/10.52403/ijhsr.20231042
Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions, SN Computer Science, 2(3), Article 160. https://doi.org/10.1007/s42979-021-00592-x
Sasaki, T., (2020). Lung Cancer Classify Model & Positive Clustering. Retrieved, from https://www.kaggle.com/code/sasakitetsuya/lung-cancer-classify-model-positive-clustering
Shandilya, E., & Fan, M. (2022, October 22-23). Understanding older adults’ perceptions and challenges in using AI-enabled everyday technologies [Conference presentation]. Proceedings of the Tenth International Symposium of Chinese CHI, Guangzhou, China. https://doi.org/10.48550/arXiv.2210.01369
Sherry, V. (2022). Lung cancer: Prevention and early identification are key. The Nurse Practitioner, 47(7), 42-47. https://doi.org/10.1097/01.NPR.0000832548.88417.be
Singh, G. A. P., & Gupta, P. K. (2019). Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Neural Computing and Applications, 31(10), 6863-6877.. https://doi.org/10.1007/s00521-018-3518-x
Sowmya, C., Kumar, A. G., & Kumar, S. (2021). Stacked LSTM Recurrent Neural Network: A Deep Learning Approach for Short Term Wind Speed Forecasting. International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-7. https://doi.org/10.1109/CONIT51480.2021.9498314.
Teramoto, A., Tsukamoto, T., Kiriyama, Y., & Fujita, H. (2017). Automated classification of lung cancer types from cytological images using deep convolutional neural networks. BioMed Research International, 2017(1), Article 4067832. https://doi.org/10.1155/2017/4067832
Thandra, K. C., Barsouk, A., Saginala, K., Aluru, J. S., & Barsouk, A. (2021). Epidemiology of lung cancer. Contemporary Oncology /Współczesna Onkologia, 25(1), 45-52. https://doi.org/10.5114/wo.2021.103829
Tunali, I., Gillies, R. J., & Schabath, M. B. (2021). Application of Radiomics and Artificial Intelligence for Lung Cancer Precision Medicine. Cold Spring Harbor Perspectives in Medicine, 11(8), Article a039537. https://doi.org/10.1101/cshperspect.a039537
Vidaver, R. M., Shershneva, M. B., Hetzel, S. J., Holden, T. R., & Campbell, T. C. (2016). Typical time to treatment of patients with lung cancer in a multisite, US-based study. Journal of Oncology Practice, 12(6), e643-e653. https://doi.org/10.1200/JOP.2015.009605
Walser, T., Cui, X., Yanagawa, J., Lee, J. M., Heinrich, E., Lee, G., ... & Dubinett, S. M. (2008). Smoking and Lung Cancer, The Role of Inflammation, Proceedings of the American Thoracic Society, 5(8), 811-815. https://doi.org/10.1513/pats.200809-100TH
Wang, S., Yang, D. M., Rong, R., Zhan, X., Fujimoto, J., Liu, H., ... & Xiao, G. (2019). Artificial Intelligence in Lung Cancer Pathology Image Analysis. Cancers, 11(11), Article 1673. https://doi.org/10.3390/cancers11111673
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 Journal of Current Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.