Identification of Important Factors in the Diagnosis of Breast Cancer Cells Using Machine Learning Models and Principal Component Analysis
DOI:
https://doi.org/10.59796/jcst.V13N3.2023.700Keywords:
breast cancer classification, classification of malignant and benign cells, machine learning, principal component analysis, complexity reduced model, intelligent diagnostic softwareAbstract
Breast cancer (BC) is now identified as a disease with a significant impact on morbidity and mortality that is growing and widespread worldwide. This study uses a publicly available clinical dataset of 699 patients from the University of Wisconsin with 9 variables: (1) clump thickness, (2) uniformity of cell size, (3) uniformity of cell shape, (4) marginal adhesion, (5) single epithelial cell size, (6) bare nuclei, (7) bland chromatin, (8) normal nucleoli, and (9) mitoses. This dataset has been used for many studies in the past to pinpoint critical factors in patient diagnosis. Here, we use this data to ensure its unbiasedness and accuracy. We then apply principal component analysis and machine learning models to identify factors in diagnosing a malignant or benign tumor. We investigate and compare the classification accuracy of different machine learning models, including tree, linear discriminant, quadratic discriminant, logistic regression, naive Bayes, support vector machine (SVM), K-nearest neighbor (KNN), ensemble, neural network, and kernel. The best models that can achieve the highest accuracy are medium Gaussian SVM, coarse Gaussian SVM, and cosine KNN, with an accuracy of 96.5%. The principal component analysis method is then performed to identify crucial components and build an accurate model with fewer parameters. The medium Gaussian SVM has the highest cross-validation classification accuracy of 96.98% and requires only three predictors: normal nucleoli, bare nuclei, and cell size uniformity.
References
Colangelo, T., Carbone, A., Mazzarelli, F., Cuttano, R., Dama, E., Nittoli, T., ... & Mazzoccoli, G. (2022). Loss of circadian gene Timeless induces EMT and tumor progression in colorectal cancer via Zeb1-dependent mechanism. Cell Death & Differentiation, 29(8), 1552-1568. https://doi.org/10.1038/s41418-022-00935-y
Cornell, L., Sahni, S., Couch, F., & Clune, C. (2022). Clinical Implications and Utility of Polygenic Risk Scores in Women at Elevated Risk for Breast Cancer. Journal of Precision Medicine, 8(3), 408-413. https://doi.org/10.7326/M20-5874
Cserni, G., Chmielik, E., Cserni, B., & Tot, T. (2018). The new TNM-based staging of breast cancer. Virchows Archiv, 472, 697-703. https://doi.org/10.1007/s00428-018-2301-9
Dange, V., Shid, S., Magdum, C., & Mohite, S. (2017). A review on breast cancer: An overview. Asian Journal of Pharmaceutical Research, 7(1), 49-51. https://doi.org/10.5958/2231-5691.2017.00008.9
Das, A. K., Biswas, S. K., Bhattacharya, A., & Alam, E. (2021, March 19-20). Introduction to Breast Cancer and Awareness [Conference presentation]. 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India. https://doi.org/10.1109/ICACCS51430.2021.9441686
Dileep, G., & Gyani, S. G. G. (2022). Artificial intelligence in breast cancer screening and diagnosis. Cureus, 14(10) Article e30318. https://doi.org/10.7759/cureus.30318
Dumalaon-Canaria, J. A., Hutchinson, A. D., Prichard, I., & Wilson, C. (2014). What causes breast cancer? A systematic review of causal attributions among breast cancer survivors and how these compare to expert-endorsed risk factors. Cancer Causes & Control, 25, 771-785. https://doi.org/10.1007/s10552-014-0377-3
Frable, W. J. (1983). Fine-needle aspiration biopsy: a review. Human pathology, 14(1), 9-28. https://doi.org/10.1007/s10552-014-0377-3
Giaquinto, A. N., Sung, H., Miller, K. D., Kramer, J. L., Newman, L. A., Minihan, A., ... & Siegel, R. L. (2022). Breast cancer statistics, 2022. CA: a cancer journal for clinicians, 72(6), 524-541. https://doi.org/10.3322/caac.21754
Gill, S. S., Xu, M., Ottaviani, C., Patros, P., Bahsoon, R., Shaghaghi, A., ... & Uhlig, S. (2022). AI for next generation computing: Emerging trends and future directions. Internet of Things, 19, Article 100514. https://doi.org/10.1016/j.iot.2022.100514
Gupta, R., Kurc, T., & Saltz, J. H. (2022). Introduction to Digital Pathology from Historical Perspectives to Emerging Pathomics. Whole Slide Imaging: Current Applications and Future Directions, 1-22. https://doi.org/10.1007/978-3-030-83332-9_1
Iqbal, M. S., Ahmad, W., Alizadehsani, R., Hussain, S., & Rehman, R. (2022). Breast Cancer Dataset, Classification and Detection Using Deep Learning. Healthcare, 10(12), Article 2395. https://doi.org/10.3390/healthcare10122395
Kaur, K., Sagar, A. K., & Chakraborty, S. (2022). Accelerating the performance of sequence alignment using machine learning with RAPIDS enabled GPU. Journal of Current Science and Technology, 12(3), 462-481.
Lever, J., Krzywinski, M., & Altman, N. (2017). Points of significance: Principal component analysis. Nature methods, 14(7), 641-643. https://doi.org/10.1038/nmeth.4346
Liu, Y., Nguyen, N., & Colditz, G. A. (2015). Links between alcohol consumption and breast cancer: a look at the evidence. Women’s health, 11(1), 65-77. https://doi.org/10.2217/WHE.14.62
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. University of Wisconsin-Madison Department of Computer Sciences. Retrieved from https://minds.wisconsin.edu/bitstream/handle/1793/59346/TR958.pdf
McGuire, A., Brown, J. A., & Kerin, M. J. (2015). Metastatic breast cancer: the potential of miRNA for diagnosis and treatment monitoring. Cancer and metastasis reviews, 34, 145-155. https://doi.org/10.1007/s10555-015-9551-7
Montazeri, M., Montazeri, M., Montazeri, M., & Beigzadeh, A. (2016). Machine learning models in breast cancer survival prediction. Technology and Health Care, 24(1), 31-42. https://doi.org/10.3233/THC-151071
Mouriquand, J., & Pasquier, D. (1980). Fine needle aspiration of breast carcinoma: a preliminary cytoprognostic study. Acta cytologica, 24(2), 153-159. https://pubmed.ncbi.nlm.nih.gov/6245554/
Nassif, A. B., Talib, M. A., Nasir, Q., Afadar, Y., & Elgendy, O. (2022). Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artificial Intelligence in Medicine, 127, Article 102276. https://doi.org/10.1016/j.artmed.2022.102276
Ohno-Machado, L., & Bialek, D. (1998). Diagnosing breast cancer from FNAs: variable relevance in neural network and logistic regression models. MEDINFO'98 (pp. 537-540). IOS Press Ebooks. https://doi.org/10.3233/978-1-60750-896-0-537
Oliveri, S., Faccio, F., Pizzoli, S., Monzani, D., Redaelli, C., Indino, M., & Pravettoni, G. (2019). A pilot study on aesthetic treatments performed by qualified aesthetic practitioners: efficacy on health-related quality of life in breast cancer patients. Quality of Life Research, 28, 1543-1553. https://doi.org/10.1007/s11136-019-02133-9
Osareh, A., & Shadgar, B. (2010, April 20-22). Machine learning techniques to diagnose breast cancer [Conference presentation]. 2010 5th international symposium on health informatics and bioinformatics. https://doi.org/10.1109/HIBIT.2010.5478895
Ostertagova, E., Ostertag, O., & Kováč, J. (2014). Methodology and application of the Kruskal-Wallis test. Paper presented at the Applied mechanics and materials, Ankara, Turkey. https://doi.org/10.4028/www.scientific.net/AMM.611.115
Panyamit, T., Sukvivatn, P., Chanma, P., Kim, Y., Premratanachai, P., & Pechprasarn, S. (2022). Identification of factors in the survival rate of heart failure patients using machine learning models and principal component analysis. Journal of Current Science and Technology, 12(2), 336-348. https://ph04.tci-thaijo.org/index.php/JCST/article/view/299
Perrin, S., & Roncalli, T. (2020). Machine learning optimization algorithms & portfolio allocation. Machine Learning for Asset Management: New Developments and Financial Applications, 261-328. https://doi.org/10.1002/9781119751182.ch8
Richardson, L. C., King, J. B., Thomas, C. C., Richards, T. B., Dowling, N. F., & King, S. C. (2022). Peer Reviewed: Adults Who Have Never Been Screened for Colorectal Cancer, Behavioral Risk Factor Surveillance System, 2012 and 2020. Preventing Chronic Disease, 19, Article E21. https://doi.org/10.5888/pcd19.220001
Saelee, P., Pongtheerat, T., Sophonnithiprasert, T., & Jinda, W. (2022). Clinicopathological significance of FANCA mRNA expression in Thai patients with breast cancer. Journal of Current Science and Technology, 12(3), 408-416. https://ph04.tci-thaijo.org/index.php/JCST/article/view/254
Shah, S. M., Khan, R. A., Arif, S., & Sajid, U. (2022). Artificial intelligence for breast cancer analysis: Trends & directions. Computers in Biology and Medicine, 142, Article 105221. https://doi.org/10.1016/j.compbiomed.2022.105221
Sheikh, A., Md, S., & Kesharwani, P. (2022). Aptamer grafted nanoparticle as targeted therapeutic tool for the treatment of breast cancer. Biomedicine & Pharmacotherapy, 146, Article 112530. https://doi.org/10.1016/j.biopha.2021.112530
Shimoi, T., Nagai, S. E., Yoshinami, T., Takahashi, M., Arioka, H., Ishihara, M., ... & Toyama, T. (2020). The Japanese breast cancer society clinical practice guidelines for systemic treatment of breast cancer, 2018 edition. Breast Cancer, 27, 322-331. https://doi.org/10.1007/s12282-020-01085-0
Siegel, R. L., Miller, K. D., Wagle, N. S., & Jemal, A. (2023). Cancer statistics, 2023. CA: a cancer journal for clinicians, 73(1), 17-48. https://doi.org/10.3322/caac.21763
Swathi, T., Krishna, S., & Ramesh, M. V. (2019, March 21-23). A survey on breast cancer diagnosis methods and modalities [Conference presentation]. 2019 international conference on wireless communications signal processing and networking (WiSPNET), Chennai, India. https://doi.org/10.1109/WiSPNET45539.2019.9032799
Troxel, D. B. (2006). Medicolegal aspects of error in pathology. Archives of Pathology & Laboratory Medicine, 130(5), 617-619. https://doi.org/10.5858/2006-130-617-MAOEIP
Versaggi, S. L., & De Leucio, A. (2020). Breast Biopsy. Europe PMC. Retrieved from https://europepmc.org/article/nbk/nbk559147
Vidal, R., Ma, Y., Sastry, S. S., Vidal, R., Ma, Y., & Sastry, S. S. (2016). Principal component analysis (pp. 25-62). Springer New York. https://doi.org/10.1007/978-0-387-87811-9_2
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. AppliedMathematics:Wolberg and Mangasarian, 87(23), 9193-9196. https://doi.org/10.1073/pnas.87.23.9193
Wolberg, W. H., Mangasarian, O. L., & Setiono, R. (1989). Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis. University of Wisconsin-Madison Department of Computer Sciences. Retrieved from http://digital.library.wisc.edu/1793/59186
Yau, C., Osdoit, M., van der Noordaa, M., Shad, S., Wei, J., de Croze, D., ... & Symmans, W. F. (2022). Residual cancer burden after neoadjuvant chemotherapy and long-term survival outcomes in breast cancer: a multicentre pooled analysis of 5161 patients. The Lancet Oncology, 23(1), 149-160. https://doi.org/10.1016/S1470-2045(21)00589-1
Yedjou, C. G., Sims, J. N., Miele, L., Noubissi, F., Lowe, L., Fonseca, D. D., ... & Tchounwou, P. B. (2019). Health and racial disparity in breast cancer. Breast cancer metastasis and drug resistance: Challenges and progress, 31-49. https://doi.org/10.1007/978-3-030-20301-6_3
Downloads
Published
How to Cite
Issue
Section
Categories
- Biomedical engineering
- Computing (Computer Science; Computer Engineering) > Artificial Intelligence (AI)
- Computing (Computer Science; Computer Engineering) > Bioinformatics
- Computing (Computer Science; Computer Engineering) > Data Science and Analytics
- Computing (Computer Science; Computer Engineering) > Machine Learning and Intelligent Systems
License
Copyright (c) 2023 Journal of Current Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.