Appositeness of Hoeffding tree models for breast cancer classification
Keywords:
bagging, boosting, breast cancer (BC), class balancer (CB), decision tree (DT), ensemble, Hoeffding tree (HT)Abstract
Supervised machine learning models have been shown to be effective in disease-related classification and prediction tasks by employing several classifiers. A prominent category among the set of supervised machine learners is decision trees. Decision Trees comprises of an assortment of tree classifiers. Each of these types of decision trees are extensively used as supervised learners for various classification problems. In this paper, to deal with the classification of breast cancer tumours into malignant or benign types, a subcategory of decision trees so called Hoeffding Trees are employed. Hoeffding Trees is a type of decision tree classifier that are usually effective when working with data streams. In this paper, we explore the performance and appropriateness of Hoeffding trees in building models to classify breast cancer tumours as either benign or malignant. Individual and ensemble models using Hoeffding trees are implemented for classification of breast cancer. In the work proposed here a class-balancer Hoeffding Tree model is realized and it was seen demonstrating the best performance among the different Hoeffding Tree models employed. The proposed model yielded an accuracy of 97.9%. Several other performance measures are also used to evaluate the performance of the implemented Hoeffding tree models. This work highlights the appositeness of Hoeffding tree models for breast cancer classification.
References
Alhayali, R. A. I., Ahmed, M. A., Mohialden, Y. M., & Ali, A. H. (2020). Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes. Indonesian Journal of Electrical Engineering and Computer Science, 18(2), 1074-1080. DOI: 10.11591/ijeecs. v18.i2
Arundthathi, A., Glory Vijayaselvi, K., & Savithri, V. (2017). Assessment of Decision Tree Algorithm on Student’s Recital. International Research Journal of Engineering and Technology, 4(3), 2342-2348.
Barua, S., Islam, M. M., Yao, X., & Murase, K. (2012). MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on knowledge and data engineering, 26(2), 405-425. https://doi.org/10.1109/TKDE.2012.232
Baruah, A. J., Goswami, J., Bora, D. J., & Baruah, S. (2022). A Comparative Research of Different Classification Algorithms. In Intelligent Sustainable Systems (pp. 631-646). Singapore: Springer. DOI: https://doi.org/10.1007/978-981-16-2422-3_50
Benbrahim, H., Hachimi, H., & Amine, A. (2019). Comparative study of machine learning algorithms using the breast cancer dataset. In International Conference on Advanced Intelligent Systems for Sustainable Development (pp. 83-91). Cham: Springer. https://doi.org/10.1007/978-3-030-36664-3_10
Benllarch, M., Benhaddi, M., & El Hadaj, S. (2021). Enhanced Hoeffding Anytime Tree: A Real-time Algorithm for Early Prediction of Heart Disease. International Journal on Artificial Intelligence Tools, 30(03), 2150010. https://doi.org/10.1142/S021821302150010X
Boeri, C., Chiappa, C., Galli, F., De Berardinis, V., Bardelli, L., Carcano, G., & Rovera, F. (2020). Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer medicine, 9(9), 3234-3243. DOI: 10.1002/cam4.2811
Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126. DOI: doi:10.1177/1748301818756225
Deepa, B. G., Senthil, S., & Singh, P. (2019). Data Mining on Classifiers Prophecy of Breast Cancer Tissues. International Journal of Advanced Networking and Applications, 10(5), 8-12.
Elen, A., & Avuçlu, E. (2021). Standardized Variable Distances: A distance-based machine learning method. Applied Soft Computing, 98, 106855. https://doi.org/10.1016/j.asoc.2020.106855
Elhoseny, M. (2020). Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circuits, Systems, and Signal Processing, 39(2), 611-630.
Gu, Q., Zhu, L., & Cai, Z. (2009). Evaluation measures of the classification performance of imbalanced data sets. In International symposium on intelligence computation and applications (pp. 461-471). Springer, Berlin, Heidelberg.
Hasan, M. R., Abu Bakar, N. A., Siraj, F., Sainin, M. S., & Hasan, S. (2015). Single decision tree classifiers' accuracy on medical data. Retrieved form https://repo.uum.edu.my/id/eprint/15527.
Hulten, G., Spencer, L., & Domingos, P. (2001, August). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 97-106), https://doi.org/10.1145/502512.502529
Imran, B., Hambali, H., Subki, A., Zaeniah, Z., Yani, A., & Alfian, M. R. (2022). Data Mining Using Random Forest, Naïve Bayes, and Adaboost Models for Prediction and Classification of Benign and Malignant Breast Cancer. Jurnal Pilar Nusa Mandiri, 18(1), 37-46. https://doi.org/10.33480/pilar.v18i1.2912
Islam, M., Haque, M., Iqbal, H., Hasan, M., Hasan, M., & Kabir, M. N. (2020). Breast cancer prediction: a comparative study using machine learning techniques. SN Computer Science, 1(5), 1-14, https://doi.org/10.1007/s42979-020-00305-w
Jabbar, M. A. (2021). Breast cancer data classification using ensemble machine learning. Engineering and Applied Science Research, 48(1), 65-72.
Kapoor, P., & Rani, R., 2015, A Survey of Classification Methods Utilizing Decision Trees. International Journal of Engineering Trends and Technology, 22(4), 188-194. DOI: 10.14445/22315381/IJETT-V22P240
Kumar, A., Kaur, P., & Sharma, P. (2015). A survey on Hoeffding tree stream data classification algorithms. CPUH-Research Journal, 1(2), 28-32.
Lu, J., Hales, A., Rew, D., Keech, M., Fröhlingsdorf, C., Mills-Mullett, A., & Wette, C. (2015, September). Data mining techniques in health informatics: a case study from breast cancer research. In International Conference on Information Technology in Bio-and Medical Informatics (pp. 56-70). Springer, Cham. https://doi.org/10.1007/978-3-319-22741-2_6
Luque, A., Carrasco, A., Martín, A., & de Las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216-231, https://doi.org/10.1016/j.patcog.2019.02.023
Manju, B. R., & Amrutha, V. S. (2018). Comparative study of datamining algorithms for Diagnostic Mammograms using Principal component analysis and J48. ARPN Journal of Engineering and Applied Sciences, 15(3), 354-362.
Mathew, T. E. (2019a). A comparative study of the performance of different Support Vector machine Kernels in Breast Cancer Diagnosis. International Journal of Information and Computing Science, 6(6), 432-441. DOI: 16.10089/IJICS
Mathew, T. E. (2019b). A logistic regression with recursive feature elimination model for breast cancer diagnosis. International Journal on Emerging Technologies, 10(3), 55-63.
Mathew, T. E. (2019c). Simple and ensemble decision tree classifier based detection of breast cancer. International Journal of Scientific & Technology Research, 8(11), 1628-1637.
Mathew, T. E., & Kumar, K. A. (2020). A Logistic Regression based hybrid model for Breast Cancer Classification. Indian Journal of Computer Science and Engineering (IJCSE), 11(6), 899-903. DOI: 10.21817/indjcse/2020/v11i6/201106201
Mathew, T. E., Kumar, K. S., (2021). A Modified- Weighted- K -Nearest Neighbour and Cuckoo Search Hybrid Model for Breast Cancer Classification. Indian Journal of Computer Science and Engineering (IJCSE), 12(1), 166-177. DOI: 10.21817/indjcse/2021/v12i1/211201211
Mathew, T. E. (2022a). An Improvised Random Forest Model for Breast Cancer Classification. NeuroQuantology, 20(5), 713-722.
Mathew, T. E. (2022b). An Optimized Extremely Randomized Tree Model For Breast Cancer Classification. Journal of Theoretical and Applied Information Technology, 100(16), 5234-5246.
Melethadathil, N., Chellaiah, P., Nair, B., & Diwakar, S. (2015, August). Classification and clustering for neuroinformatics: Assessing the efficacy on reverse-mapped NeuroNLP data using standard ML techniques. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1065-1070). IEEE, 978-1-4799-8792-4/15/$31.00
Moayedi, H., Jamali, A., Gibril, M. B. A., Kok Foong, L., & Bahiraei, M. (2020). Evaluation of tree-base data mining algorithms in land used/land cover mapping in a semi-arid environment through Landsat 8 OLI image; Shiraz, Iran. Geomatics, Natural Hazards and Risk, 11(1), 724-741, DOI: 10.1080/19475705.2020.1745902
Olayinka, T. C., & Chiemeke, S. C. (2019). Predicting paediatric malaria occurrence using classification algorithm in data mining. Journal of Advances in Mathematics and Computer Science, 31(4), 1-10. DOI: 10.9734/JAMCS/2019/v31i430118
Onan, A. (2015). On the performance of ensemble learning for automated diagnosis of breast cancer. In Artificial intelligence perspectives and applications (pp. 119-129). Springer, Cham, https://doi.org/10.1007/978-3-319-18476-0_13
Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232-247., https://doi.org/10.1016/j.eswa.2016.03.045
Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38, https://doi.org/10.1177%2F0165551515613226
Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814-833., https://doi.org/10.1016/j.ipm.2017.02.008
Onan, A. (2018a). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 28-47, https://doi.org/10.1177%2F0165551516677911
Onan, A. (2018b). Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine, 2018, 1-22. https://doi.org/10.1155/2018/2497471
Onan, A. (2019a). Topic-enriched word embeddings for sarcasm identification. In Computer Science On-line Conference (pp. 293-304). Springer, Cham, https://doi.org/10.1007/978-3-030-19807-7_29
Onan, A. (2019b). Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access, 7, 145614-145633., Digital Object Identifier 10.1109/ACCESS.2019.2945911
Onan, A. (2019c). Consensus clustering-based under sampling approach to imbalanced learning. Scientific Programming, 2019, 1-14. https://doi.org/10.1155/2019/5901087
Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138, https://doi.org/10.1002/cae.22179
Onan, A. (2021a). Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572-589., https://doi.org/10.1002/cae.22253
Onan, A. (2021b). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, 33(23), e5909., https://doi.org/10.1002/cpe.5909
Onan, A., & Toçoğlu, M. A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9, 7701-7722., https://doi.org/10.1109/ACCESS.2021.3049734
Onan, A. (2022). Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences, 34(5), 2098-2117. https://doi.org/10.1016/j.jksuci.2022.02.025
Osman, A. H., & Aljahdali, H. M. A. (2020). An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access, 8, 39165-39174.
Phua, E. J., & Batcha, N. K. (2020). Comparative analysis of ensemble algorithms’ prediction accuracies in education data mining. Journal of Critical Review, 7(3), 37-40. http://dx.doi.org/10.31838/jcr.07.03.06
Ponnaganti, N. D., & Anitha, R. (2022). A Novel Ensemble Bagging Classification Method for Breast Cancer Classification Using Machine Learning Techniques. Traitement du Signal, 39(1), 229-237.
Rajamohana, S. P., Umamaheswari, K., Karunya, K., & Deepika, R. (2020). Analysis of classification algorithms for breast cancer prediction. In Data Management, Analytics and Innovation (pp. 517-528). Springer, Singapore, https://doi.org/10.1007/978-981-32-9949-8_36
Sailusha, R., Gnaneswar, V., Ramesh, R., & Rao, G. R. (2020, May). Credit card fraud detection using machine learning. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 1264-1270). IEEE, https://doi.org/10.1109/ICICCS48265.2020.9121114
Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. E. (2012). Breast cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and Information Technolog, 1(1), 36-43.
Saputra, R. H., & Prasetyo, B. (2020). Improve the accuracy of c4. 5 algorithm using particle swarm optimization (pso) feature selection and bagging technique in breast cancer diagnosis. Journal of Soft Computing Exploration, 1(1), 47-55, https://doi.org/10.52465/joscex.v1i1.9
Saraswat, D., & Singh, P. (2020). Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease. In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences (pp. 245-255). Springer, Singapore, https://doi.org/10.1007/978-981-15-6318-8_21
Sathishkumar, K., Vinodh, N., Badwe, R. A., Deo, S. V. S., Manoharan, N., Malik, R., ... & Mathur, P. (2021). Trends in breast and cervical cancer in India under National Cancer Registry Programme: an age-period-cohort analysis. Cancer Epidemiology, 74, 101982, https://doi.org/10.1016/j.canep.2021.101982
Seraphim, B. I., & Poovammal, E. (2021). Based Data Classification Techniques in Healthcare Using Massive Online Analysis Framework. Machine Learning and Analytics in Healthcare Systems: Principles and Applications (213). US: CRC Press.
Shastri, S., Kour, P., Kumar, S., Singh, K., Sharma, A., & Mansotra, V. (2021). A nested stacking ensemble model for predicting districts with high and low maternal mortality ratio (MMR) in India. International Journal of Information Technology, 13(2), 433-446. https://doi.org/10.1007/s41870-020-00560-3
Siddiqui, S. Y., Naseer, I., Khan, M. A., Mushtaq, M. F., Naqvi, R. A., Hussain, D., & Haider, A. (2021). Intelligent breast cancer prediction empowered with fusion and deep learning, Computers, Materials and Continua, 67(1), 1033-1049. http://dx.doi.org/10.32604/cmc.2021.013952
Subash Chandra Bose, S., Sivanandam, N., & Praveen Sundar, P. V. (2021). Design of ensemble classifier using Statistical Gradient and Dynamic Weight LogitBoost for malicious tumor detection. Journal of Ambient Intelligence and Humanized Computing, 12(6), 6713-6723. https://doi.org/10.1007/s12652-020-02295-2
Sultana, J., & Jilani, A. K. (2021). Classifying Cyberattacks Amid Covid-19 Using Support Vector Machine. In Security Incidents & Response Against Cyber Attacks (pp. 161-175). Springer, Cham, https://doi.org/10.1007/978-3-030-69174-5_8
Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 71(3), 209-249. DOI: 10.3322/caac.21660
Tekur, A., & Jain, P. (2018). A Study on Classification Algorithms for Predicting Colon Cancer using Gene Tissue Parameters. International Journal of Pure and Applied Mathematics, 119(18), 2147-2166.
Tiwari, M., Bharuka, R., Shah, P., & Lokare, R. (2020). Breast cancer prediction using deep learning and machine learning techniques. Available at SSRN 3558786. Retrieved form https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3558786
Ummadi, J. R., Venkata Ramana Reddy, B., & Eswara Reddy, B. (2018). A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection. In Proceedings of International Conference on Computational Intelligence and Data Engineering (pp. 229-245). Springer, Singapore. https://doi.org/10.1007/978-981-10-6319-0_20
Vamvakas, A., Tsivaka, D., Logothetis, A., Vassiou, K., & Tsougos, I. (2022). Breast Cancer Classification on Multiparametric MRI–Increased Performance of Boosting Ensemble Methods. Technology in Cancer Research & Treatment, 21, 15330338221087828. https://doi.org/10.1177/15330338221087828
Vidyapith, B. (2020). Machine Learning Classifiers, Meta Classifiers Comparison And Analysis On Breast Cancer And Diabetes Datasets. Advances and Applications in Mathematical Sciences, 19(10), 1017-1028.
Vinod, A., & Manju, B. R. (2020). Optimized Prediction Model to Diagnose Breast Cancer Risk and Its Management. In Inventive Communication and Computational Technologies (pp. 503-515). Springer, Singapore. https://doi.org/10.1007/978-981-15-0146-3_4.
Yadav, D. C., & Pal, S. (2019). Decision tree ensemble techniques to predict thyroid disease. International Journal of Recent Technology and Engineering, 8(3), 8242-8246. DOI: 10.35940/ijrte.C6727.098319
Zhang, W., & Zhao, L. (2020). Online decision trees with fairness. arXiv preprint arXiv:2010.08146. https://doi.org/10.48550/arXiv.2010.08146
Zheng, J., Lin, D., Gao, Z., Wang, S., He, M., & Fan, J. (2020). Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access, 8, 96946-96954. https://doi.org/10.1109/ACCESS.2020.2993536
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Journal of Current Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.