Optimizing Lung Cancer Diagnosis with Machine Learning and Feature Selection Methods

Authors

  • Suejit Pechprasarn College of Biomedical Engineering, Rangsit University, Pathum Thani 12000, Thailand
  • Nichapha Suechoey Satriwithaya School, Wat Bowon Niwet, Phra Nakhon, Bangkok 10200, Thailand
  • Nutchareeya Pholtrakoolwong Satriwithaya School, Wat Bowon Niwet, Phra Nakhon, Bangkok 10200, Thailand
  • Pattaraporn Tanedvorapinyo Satriwithaya School, Wat Bowon Niwet, Phra Nakhon, Bangkok 10200, Thailand
  • Yanisa Toboonliang Satriwithaya School, Wat Bowon Niwet, Phra Nakhon, Bangkok 10200, Thailand

DOI:

https://doi.org/10.59796/jcst.V14N3.2024.55

Keywords:

Lung Cancer, machine learning, smart diagnosis tool, artificial intelligence, feature selection, classification

Abstract

Lung cancer is a prevalent disease, with nearly 238,000 new cases diagnosed in 2023. This study utilizes clinical predictors from a Kaggle dataset containing 309 observations across 15 variables to aid in lung cancer diagnosis. The variables include swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, chronic disease, chest pain, coughing, fatigue, smoking, age, and shortness of breath. The research aims to develop and compare various supervised machine learning models for classifying and predicting lung cancer, while also identifying key clinical tests and parameters using unsupervised statistical models. The dataset was divided into training and test sets, balanced, and preprocessed for unbiased training. Feature selection and machine learning models were applied to identify crucial predictors. The study explored tree models, logistic regression, Naïve Bayes, support vector machine (SVM), ensemble, neural network, and kernel models. Among these, the linear SVM achieved the highest accuracy of 93.75% with 5-fold cross-validation. However, it showed overfitting, with a lower test accuracy of 82.55%. The Gaussian Naïve Bayes model emerged as the optimal choice, providing consistent performance between validation and test cases. It achieved the highest cross-validation classification accuracy of 82.81% using only 9 variables: swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, and chronic disease. This model allows for effective training with fewer predictors without compromising classification

References

Arroliga, A. C., & Matthay, R. A. (1993). The role of bronchoscopy in lung cancer. Clinics in Chest Medicine, 14(1), 87-98. https://doi.org/10.1016/S0272-5231(21)01150-3

Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: focus on clinicians. Journal of Medical Internet Research, 22(6), Article e15154. https://doi.org/10.2196/15154

Bukhari, A., Kumar, G., Rajsheker, R., & Markert, R. (2017). Timeliness of Lung Cancer Diagnosis and Treatment, Federal Practitioner, 34(1), 24S-29S. http://www.ncbi.nlm.nih.gov/pmc/articles/pmc6375422/

Cao, Z. (2017, May 13-14). Development and Application of Artificial Intelligence [Conference presentation], Proceedings of the 2nd International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2017), Dalian, China. https://doi.org/10.2991/icmeit-17.2017.79

Cegla, P., Bos-Liedke, A., Burchardt, E., Konstanty, E., Piotrowski, A., Kozak, M., & Cholewinski, W. (2023). Diagnosis and treatment of lung cancer using nuclear medicine techniques—current state of the art. Nuclear Medicine Review, 26, 77-84. https://doi.org/10.5603/NMR.2023.0010

Chiu, H. Y., Chao, H. S., & Chen, Y. M. (2022). Application of Artificial Intelligence in Lung Cancer. Cancers, 14(6), 1-17. https://doi.org/10.3390/cancers14061370

Gould, M. K., Huang, B. Z., Tammemagi, M. C., Kinar, Y., & Shiff, R. (2021). Machine Learning for Early Lung Cancer. Identification Using Routine Clinical and Laboratory Data. American Journal of Respiratory and Critical Care Medicine, 204(4), 445-453. https://doi.org/10.1164/rccm.202007-2791OC

Karabulut, E. M., Özel, S. A., & İbrikçi, T., (2012). A comparative study on the effect of feature selection on classification accuracy, Procedia Technology, 1(2012), 323-327. https://doi.org/10.1016/j.protcy.2012.02.068

Ketkomol, P., Songsak, T., Jongrungruangchok, S., Madaka, F., & Pradubyat, N. (2024). The Effect of 1'-acetoxychavicol Acetate on A549 Human Non-small Cell Lung Cancer. Journal of Current Science and Technology, 14(2)., Article 43. https://doi.org/10.59796/jcst.V14N2.2024.43

Kim, J., Lee, H., & Huang, B. W. (2022). Lung Cancer: Diagnosis, Treatment, Principles, and Screening. Clinical Presentation and Diagnosis, 105(5), 1-2. https://www.binasss.sa.cr/mayo/2.pdf

Lakshmanaprabu, S., Mohanty, S., Shankar, K., Arunkumar, N., & Ramírez-González, G. (2019). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374-382. https://doi.org/10.1016/j.future.2018.10.009

Lareau, S., Slator, C., & Smyth, R. (2021). Lung Cancer. American Journal of Respiratory and Critical Care Medicine, 204(12), 21-22. https://doi.org/10.1164/rccm.20411P21

Li, C., Lei, S., Ding, L., Xu, Y., Wu, X., Wang, H., ... & Li, L. (2023). Global burden and trends of lung cancer incidence and mortality, Chinese Medical Journal, 136(13), 1583-1590. https://doi.org/10.1097/CM9.0000000000002529

Liu, S., Liu, S., Zhang, C., Yu, H., Liu, X., Hu, Y., ... & Fu, Q. (2020). Exploratory Study of a CT Radiomics Model for the Classification of Small Cell Lung Cancer and Non-small-Cell Lung Cancer. Frontiers in Oncology, 10, Article 1268, 1-11. https://doi.org/10.3389/fonc.2020.01268

Nooreldeen, R., & Bach, H. (2021). Current and Future Development in Lung Cancer Diagnosis. International Journal of Molecular Sciences, 22(16), Article 8661, 1-18. https://doi.org/10.3390/ijms22168661

Pacurari, A. C., Bhattarai, S., Muhammad, A., Avram, C., Mederle, A. O., Rosca, O., ... & Mavrea, A. (2023). Diagnostic Accuracy of Machine Learning AI Architectures in Detection and Classification of Lung Cancer: A Systematic Review. Diagnostics, 13(13), Article 2145. https://doi.org/10.3390/diagnostics13132145

Pechprasarn, S., Manavibool, L., Supmool, N., Vechpanich, N., and Meepadung, P. (2023a). Predicting Parkinson's Disease Severity using Telemonitoring Data and Machine Learning Models: A Principal Component Analysis-based Approach for Remote Healthcare Services during COVID-19 Pandemic. Journal of Current Science and Technology,13(2), 465-485. https://doi.org/10.59796/jcst.V13N2.2023.694465

Pechprasarn, S., Wattanapermpool, O., Warunlawan, M., Homsud, P., & Akarajarasroj, T.(2023b). Identification of Important Factors in the Diagnosis of Breast Cancer Cells Using Machine Learning Models and Principal Component Analysis. Journal of Current Science and Technology,13(3), 642-656. https://doi.org/10.59796/jcst.V13N3.2023.700

Pereira, T., Freitas, C., Costa, J. L., Morgado, J., Silva, F., Negrão, E., ... & Oliveira, H. P. (2020). Comprehensive Perspective for Lung Cancer Characterisation Based on AI Solutions Using CT Images. Journal of Clinical Medicine, 10(1), Article 118. https://doi.org/10.3390/jcm10010118

Rana, M., & Bhushan, M. (2023). Machine learning and deep learning approach for medical image analysis: diagnosis to detection. Multimedia Tools and Applications, 82(17), 26731-26769.. https://doi:10.1007/s11042-022-14305-w

Roland, M., & Rudd, R.M. (1998). Genetics and Pulmonary Medicine, Somatic mutation in development of lung cancer. Thorax, 53, 979-983. https://doi.org/10.1136/thx.53.11.979

Ruano-Raviña, A., Provencio, M., Calvo de Juan, V., Carcereny, E, Moran, T, Rodriguez-Abreu, D., …, Cerezo, S. (2020). Lung cancer symptoms at diagnosis: results of a nationwide registry study. ESMO Open, 5(6), Article e001021. https://doi.org/10.1136/esmoopen-2020-001021.

Sankar, V., Kothai, R., Vanisri, N., Akilandeswari, S., & Anandharaj, G., (2023). Lung Cancer, A Review, International Journal of Health Sciences and Research, 13(10), 307-315. https://doi.org/10.52403/ijhsr.20231042

Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions, SN Computer Science, 2(3), Article 160. https://doi.org/10.1007/s42979-021-00592-x

Sasaki, T., (2020). Lung Cancer Classify Model & Positive Clustering. Retrieved, from https://www.kaggle.com/code/sasakitetsuya/lung-cancer-classify-model-positive-clustering

Shandilya, E., & Fan, M. (2022, October 22-23). Understanding older adults’ perceptions and challenges in using AI-enabled everyday technologies [Conference presentation]. Proceedings of the Tenth International Symposium of Chinese CHI, Guangzhou, China. https://doi.org/10.48550/arXiv.2210.01369

Sherry, V. (2022). Lung cancer: Prevention and early identification are key. The Nurse Practitioner, 47(7), 42-47. https://doi.org/10.1097/01.NPR.0000832548.88417.be

Singh, G. A. P., & Gupta, P. K. (2019). Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Neural Computing and Applications, 31(10), 6863-6877.. https://doi.org/10.1007/s00521-018-3518-x

Sowmya, C., Kumar, A. G., & Kumar, S. (2021). Stacked LSTM Recurrent Neural Network: A Deep Learning Approach for Short Term Wind Speed Forecasting. International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-7. https://doi.org/10.1109/CONIT51480.2021.9498314.

Teramoto, A., Tsukamoto, T., Kiriyama, Y., & Fujita, H. (2017). Automated classification of lung cancer types from cytological images using deep convolutional neural networks. BioMed Research International, 2017(1), Article 4067832. https://doi.org/10.1155/2017/4067832

Thandra, K. C., Barsouk, A., Saginala, K., Aluru, J. S., & Barsouk, A. (2021). Epidemiology of lung cancer. Contemporary Oncology /Współczesna Onkologia, 25(1), 45-52. https://doi.org/10.5114/wo.2021.103829

Tunali, I., Gillies, R. J., & Schabath, M. B. (2021). Application of Radiomics and Artificial Intelligence for Lung Cancer Precision Medicine. Cold Spring Harbor Perspectives in Medicine, 11(8), Article a039537. https://doi.org/10.1101/cshperspect.a039537

Vidaver, R. M., Shershneva, M. B., Hetzel, S. J., Holden, T. R., & Campbell, T. C. (2016). Typical time to treatment of patients with lung cancer in a multisite, US-based study. Journal of Oncology Practice, 12(6), e643-e653. https://doi.org/10.1200/JOP.2015.009605

Walser, T., Cui, X., Yanagawa, J., Lee, J. M., Heinrich, E., Lee, G., ... & Dubinett, S. M. (2008). Smoking and Lung Cancer, The Role of Inflammation, Proceedings of the American Thoracic Society, 5(8), 811-815. https://doi.org/10.1513/pats.200809-100TH

Wang, S., Yang, D. M., Rong, R., Zhan, X., Fujimoto, J., Liu, H., ... & Xiao, G. (2019). Artificial Intelligence in Lung Cancer Pathology Image Analysis. Cancers, 11(11), Article 1673. https://doi.org/10.3390/cancers11111673

Downloads

Published

2024-09-01

How to Cite

Pechprasarn, S., Suechoey, N., Pholtrakoolwong, N., Tanedvorapinyo, P., & Toboonliang, Y. (2024). Optimizing Lung Cancer Diagnosis with Machine Learning and Feature Selection Methods. Journal of Current Science and Technology, 14(3), Aticle 55. https://doi.org/10.59796/jcst.V14N3.2024.55

Issue

Section

Research Article

Categories