Advancing Breast Cancer Detection: A Comparison of PCA and LDA Methods in Analyzing Ultrasound Imagery
DOI:
https://doi.org/10.59796/jcst.V15N3.2025.125Keywords:
breast cancer classification, principal component analysis, linear discriminant analysis, feature extractionAbstract
Early and accurate detection of breast cancer via ultrasound imaging is essential, yet the high dimensionality of raw ultrasound features can hinder classifier performance and increase computational burden. Comparison between Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) for feature reduction in a breast-cancer ultrasound diagnostic pipeline, alongside t-SNE for exploratory visualization. The research utilized 1,200 breast ultrasound images with 400 benign, 400 malignant, and 400 normal images obtained from Baheya Hospital (Cairo, Egypt). Minority classes were balanced using data augmentation techniques like rotation and flipping. PCA reduced the data to 172 components, preserving 90% of data variance, while LDA used two components. t-SNE generated a two-dimensional visual representation. Classifiers, including Support Vector Machine (SVM), Random Forest (RF), and XGBoost, were trained on: (a) the full feature set, (b) PCA-reduced data, and (c) LDA-reduced data. Evaluation metrics included precision, recall, and F1-score. Compression ratio and signal-to-noise ratio (SNR) measured image compression via PCA.Without reduction, XGBoost achieved the highest F1-score (76.97%), precision (77.40%), and recall (76.55%). PCA yielded a modest precision gain (XGBoost: 78.65%) but reduced recall and net F1-score (76.37%). LDA significantly degraded performance (XGBoost F1: 63.99%; RF F1: 60.13%; SVM F1: 42.05%). PCA compression reduced image size by 2.68x with an SNR of 48.91 dB, while LDA offered no compression benefit. t-SNE visualization revealed clear non-linear class clusters, underscoring the dataset’s intrinsic complexity. For ultrasound-based breast cancer diagnosis, preserving full high-dimensional features and using a powerful non-linear model (e.g., XGBoost) yields optimal accuracy. PCA is best reserved for storage or runtime efficiency, LDA for scenarios with very low dimensional constraints, and t-SNE for exploratory data analysis. This comparative study highlights that dimensionality reduction may harm performance in complex imaging data and recommends context-specific use of PCA and LDA to avoid loss of critical diagnostic information.
References
Al-Dhabyani, W., Gomaa, M., Khaled, H., & Fahmy, A. (2020). Dataset of breast ultrasound images. Data in Brief, 28, Article 104863. https://doi.org/10.1016/j.dib.2019.104863
Cen, Q., Wang, M., Zhou, S., Yang, H., & Wang, Y. (2025). Multi-center study: ultrasound-based deep learning features for predicting Ki-67 expression in breast cancer. Scientific Reports, 15(1), Article 10279. https://doi.org/10.1038/s41598-025-94741-4
Fernandes, V., Carvalho, G., Pereira, V., & Bernardino, J. (2024). Analyzing data reduction techniques: an experimental perspective. Applied Sciences, 14(8), Article 3436. https://doi.org/10.3390/app14083436
Li, L., Deng, H., Ye, X., Li, Y., & Wang, J. (2023). Comparison of the diagnostic efficacy of mathematical models in distinguishing ultrasound imaging of breast nodules. Scientific Reports, 13(1), Article 16047. https://doi.org/10.1038/s41598-023-42937-x
Liu, B., Gu, X., Xie, D., Zhao, B., Han, D., Zhang, Y., ... & Fang, J. (2025). An Ultrasound-based Machine Learning Model for Predicting Tumor-Infiltrating Lymphocytes in Breast Cancer. Technology in Cancer Research & Treatment, 24, Article 15330338251334453. https://doi.org/10.1177/15330338251334453
Luo, S., Chen, X., Yao, M., Ying, Y., Huang, Z., Zhou, X., ... & Huang, C. (2025). Intratumoral and peritumoral ultrasound-based radiomics for preoperative prediction of HER2-low breast cancer: a multicenter retrospective study. Insights into Imaging, 16(1), Article 53. https://doi.org/10.1186/s13244-025-01934-6
Maiprasert, D., & Kitbumrungrat, K. (2023). Multinomial logistic regression analysis of breast cancer. Journal of Current Science and Technology, 2(1), 23–31. Retrieved from https://ph04.tci-thaijo.org/index.php/JCST/article/view/595
NDTV World. (2024). Breast cancer to cause a million deaths a year by 2040: Report. Retrieved from https://www.ndtv.com/world-news/breast-cancer-to-cause-a-million-deaths-a-year-by-2040-report-5451027
Panyamit, T., Sukvivatn, P., Chanma, P., Kim, Y., Premratanachai, P., & Pechprasarn, S. (2022). Identification of factors in the survival rate of heart failure patients using machine learning models and principal component analysis. Journal of Current Science and Technology, 12(2), 336–348. https://doi.org/10.14456/jcst.2022.26
Pechprasarn, S., Wattanapermpool, O., Warunlawan, M., Homsud, P., & Akarajarasroj, T. (2023). Identification of important factors in the diagnosis of breast cancer cells using machine learning models and principal component analysis. Journal of Current Science and Technology, 13(3), 642–656. https://doi.org/10.59796/jcst.V13N3.2023.700
Tambe, S. N., Potharaju, S., Amiripalli, S. S., Tirandasu, R. K., & Jaidhan, B. J. (2025). Interdisciplinary research for predictive maintenance of MRI machines using machine learning. Journal of Current Science and Technology, 15(1), Article 78. https://doi.org/10.59796/jcst.V15N1.2025.78
Wongnil, J., Krisanachinda, A., & Lipikorn, R. (2024). Breast cancer characterization using region-based convolutional neural network with screening and diagnostic mammogram. Journal of Associated Medical Sciences, 57(3), 8–17. https://he01.tci-thaijo.org/index.php/bulletinAMS/article/view/269765
Wu, J., Ge, L., Guo, Y., Xu, D., & Wang, Z. (2024). Utilizing multiclassifier radiomics analysis of ultrasound to predict high axillary lymph-node tumour burden in node-positive breast-cancer patients: A multicentre study. Annals of Medicine, 56(1), 2395061. https://doi.org/10.1080/07853890.2024.2395061
Xie, H., Tan, T., Li, Q., & Li, T. (2025). Revolutionizing HER-2 assessment: multidimensional radiomics in breast cancer diagnosis. BMC Cancer, 25, Article 265. https://doi.org/10.1186/s12885-025-13549-7
Yao, J., Jia, X., Zhou, W., Zhu, Y., Chen, X., & Zhan, W. (2024a). Predicting axillary response to neoadjuvant chemotherapy using peritumoral and intratumoral ultrasound radiomics in breast-cancer subtypes. iScience, 27(9), 110716. https://doi.org/10.1016/j.isci.2024.110716
Yao, J., Zhou, W., Xu, S., Jia, X., & Zhan, W. (2024b). Machine-learning-based breast-tumor ultrasound radiomics for pre-operative prediction of axillary sentinel lymph-node metastasis burden in early-stage invasive breast cancer. Ultrasound in Medicine & Biology, 50(2), 229-236. https://doi.org/10.1016/j.ultrasmedbio.2023.10.004
Zhang, L., Wang, L., Liang, R., He, X., & Jiang, J. (2024). An effective ultrasound features-based diagnostic model via principal component analysis facilitated differentiating subtypes of mucinous breast cancer from fibroadenomas. Clinical Breast Cancer, 24(7), e583–e592.e3. https://doi.org/10.1016/j.clbc.2024.05.007
Zhao, X., Guo, J., Nie, F., Chen, L., Li, Z., & Zhang, H. (2020). Joint principal component and discriminant analysis for dimensionality reduction. IEEE Transactions on Neural Networks and Learning Systems, 31(2), 433-444. https://doi.org/10.1109/TNNLS.2019.2904701
Zhao, M., Zheng, Y., Chu, J., Liu, Z., & Dong, F. (2023). Ultrasound-based radiomics combined with immune status to predict sentinel lymph-node metastasis in primary breast cancer. Scientific Reports, 13, Article 16918. https://doi.org/10.1038/s41598-023-44156-w

Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Journal of Current Science and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.