Appositeness of Hoeffding tree models for breast cancer classification


  • Tina Elizabeth Mathew Computer Science, Government College Kariavattom, Trivandrum, Kerala 695581, India


bagging, boosting, breast cancer (BC), class balancer (CB), decision tree (DT), ensemble, Hoeffding tree (HT)


Supervised machine learning models have been shown to be effective in disease-related classification and prediction tasks by employing several classifiers. A prominent category among the set of supervised machine learners is decision trees. Decision Trees comprises of an assortment of tree classifiers. Each of these types of decision trees are extensively used as supervised learners for various classification problems.  In this paper, to deal with the classification of breast cancer tumours into malignant or benign types, a subcategory of decision trees so called Hoeffding Trees are employed. Hoeffding Trees is a type of decision tree classifier that are usually effective when working with data streams. In this paper, we explore the performance and appropriateness of Hoeffding trees in building models to classify breast cancer tumours as either benign or malignant. Individual and ensemble models using Hoeffding trees are implemented for classification of breast cancer. In the work proposed here a class-balancer Hoeffding Tree model is realized and it was seen demonstrating the best performance among the different Hoeffding Tree models employed. The proposed model yielded an accuracy of 97.9%. Several other performance measures are also used to evaluate the performance of the implemented Hoeffding tree models. This work highlights the appositeness of Hoeffding tree models for breast cancer classification.


Alhayali, R. A. I., Ahmed, M. A., Mohialden, Y. M., & Ali, A. H. (2020). Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes. Indonesian Journal of Electrical Engineering and Computer Science, 18(2), 1074-1080. DOI: 10.11591/ijeecs. v18.i2

Arundthathi, A., Glory Vijayaselvi, K., & Savithri, V. (2017). Assessment of Decision Tree Algorithm on Student’s Recital. International Research Journal of Engineering and Technology, 4(3), 2342-2348.

Barua, S., Islam, M. M., Yao, X., & Murase, K. (2012). MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on knowledge and data engineering, 26(2), 405-425.

Baruah, A. J., Goswami, J., Bora, D. J., & Baruah, S. (2022). A Comparative Research of Different Classification Algorithms. In Intelligent Sustainable Systems (pp. 631-646). Singapore: Springer. DOI:

Benbrahim, H., Hachimi, H., & Amine, A. (2019). Comparative study of machine learning algorithms using the breast cancer dataset. In International Conference on Advanced Intelligent Systems for Sustainable Development (pp. 83-91). Cham: Springer.

Benllarch, M., Benhaddi, M., & El Hadaj, S. (2021). Enhanced Hoeffding Anytime Tree: A Real-time Algorithm for Early Prediction of Heart Disease. International Journal on Artificial Intelligence Tools, 30(03), 2150010.

Boeri, C., Chiappa, C., Galli, F., De Berardinis, V., Bardelli, L., Carcano, G., & Rovera, F. (2020). Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer medicine, 9(9), 3234-3243. DOI: 10.1002/cam4.2811

Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126. DOI: doi:10.1177/1748301818756225

Deepa, B. G., Senthil, S., & Singh, P. (2019). Data Mining on Classifiers Prophecy of Breast Cancer Tissues. International Journal of Advanced Networking and Applications, 10(5), 8-12.

Elen, A., & Avuçlu, E. (2021). Standardized Variable Distances: A distance-based machine learning method. Applied Soft Computing, 98, 106855.

Elhoseny, M. (2020). Multi-object detection and tracking (MODT) machine learning model for real-time video surveillance systems. Circuits, Systems, and Signal Processing, 39(2), 611-630.

Gu, Q., Zhu, L., & Cai, Z. (2009). Evaluation measures of the classification performance of imbalanced data sets. In International symposium on intelligence computation and applications (pp. 461-471). Springer, Berlin, Heidelberg.

Hasan, M. R., Abu Bakar, N. A., Siraj, F., Sainin, M. S., & Hasan, S. (2015). Single decision tree classifiers' accuracy on medical data. Retrieved form

Hulten, G., Spencer, L., & Domingos, P. (2001, August). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 97-106),

Imran, B., Hambali, H., Subki, A., Zaeniah, Z., Yani, A., & Alfian, M. R. (2022). Data Mining Using Random Forest, Naïve Bayes, and Adaboost Models for Prediction and Classification of Benign and Malignant Breast Cancer. Jurnal Pilar Nusa Mandiri, 18(1), 37-46.

Islam, M., Haque, M., Iqbal, H., Hasan, M., Hasan, M., & Kabir, M. N. (2020). Breast cancer prediction: a comparative study using machine learning techniques. SN Computer Science, 1(5), 1-14,

Jabbar, M. A. (2021). Breast cancer data classification using ensemble machine learning. Engineering and Applied Science Research, 48(1), 65-72.

Kapoor, P., & Rani, R., 2015, A Survey of Classification Methods Utilizing Decision Trees. International Journal of Engineering Trends and Technology, 22(4), 188-194. DOI: 10.14445/22315381/IJETT-V22P240

Kumar, A., Kaur, P., & Sharma, P. (2015). A survey on Hoeffding tree stream data classification algorithms. CPUH-Research Journal, 1(2), 28-32.

Lu, J., Hales, A., Rew, D., Keech, M., Fröhlingsdorf, C., Mills-Mullett, A., & Wette, C. (2015, September). Data mining techniques in health informatics: a case study from breast cancer research. In International Conference on Information Technology in Bio-and Medical Informatics (pp. 56-70). Springer, Cham.

Luque, A., Carrasco, A., Martín, A., & de Las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216-231,

Manju, B. R., & Amrutha, V. S. (2018). Comparative study of datamining algorithms for Diagnostic Mammograms using Principal component analysis and J48. ARPN Journal of Engineering and Applied Sciences, 15(3), 354-362.

Mathew, T. E. (2019a). A comparative study of the performance of different Support Vector machine Kernels in Breast Cancer Diagnosis. International Journal of Information and Computing Science, 6(6), 432-441. DOI: 16.10089/IJICS

Mathew, T. E. (2019b). A logistic regression with recursive feature elimination model for breast cancer diagnosis. International Journal on Emerging Technologies, 10(3), 55-63.

Mathew, T. E. (2019c). Simple and ensemble decision tree classifier based detection of breast cancer. International Journal of Scientific & Technology Research, 8(11), 1628-1637.

Mathew, T. E., & Kumar, K. A. (2020). A Logistic Regression based hybrid model for Breast Cancer Classification. Indian Journal of Computer Science and Engineering (IJCSE), 11(6), 899-903. DOI: 10.21817/indjcse/2020/v11i6/201106201

Mathew, T. E., Kumar, K. S., (2021). A Modified- Weighted- K -Nearest Neighbour and Cuckoo Search Hybrid Model for Breast Cancer Classification. Indian Journal of Computer Science and Engineering (IJCSE), 12(1), 166-177. DOI: 10.21817/indjcse/2021/v12i1/211201211

Mathew, T. E. (2022a). An Improvised Random Forest Model for Breast Cancer Classification. NeuroQuantology, 20(5), 713-722.

Mathew, T. E. (2022b). An Optimized Extremely Randomized Tree Model For Breast Cancer Classification. Journal of Theoretical and Applied Information Technology, 100(16), 5234-5246.

Melethadathil, N., Chellaiah, P., Nair, B., & Diwakar, S. (2015, August). Classification and clustering for neuroinformatics: Assessing the efficacy on reverse-mapped NeuroNLP data using standard ML techniques. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1065-1070). IEEE, 978-1-4799-8792-4/15/$31.00

Moayedi, H., Jamali, A., Gibril, M. B. A., Kok Foong, L., & Bahiraei, M. (2020). Evaluation of tree-base data mining algorithms in land used/land cover mapping in a semi-arid environment through Landsat 8 OLI image; Shiraz, Iran. Geomatics, Natural Hazards and Risk, 11(1), 724-741, DOI: 10.1080/19475705.2020.1745902

Olayinka, T. C., & Chiemeke, S. C. (2019). Predicting paediatric malaria occurrence using classification algorithm in data mining. Journal of Advances in Mathematics and Computer Science, 31(4), 1-10. DOI: 10.9734/JAMCS/2019/v31i430118

Onan, A. (2015). On the performance of ensemble learning for automated diagnosis of breast cancer. In Artificial intelligence perspectives and applications (pp. 119-129). Springer, Cham,

Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232-247.,

Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38,

Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814-833.,

Onan, A. (2018a). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 28-47,

Onan, A. (2018b). Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine, 2018, 1-22.

Onan, A. (2019a). Topic-enriched word embeddings for sarcasm identification. In Computer Science On-line Conference (pp. 293-304). Springer, Cham,

Onan, A. (2019b). Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access, 7, 145614-145633., Digital Object Identifier 10.1109/ACCESS.2019.2945911

Onan, A. (2019c). Consensus clustering-based under sampling approach to imbalanced learning. Scientific Programming, 2019, 1-14.

Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138,

Onan, A. (2021a). Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572-589.,

Onan, A. (2021b). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, 33(23), e5909.,

Onan, A., & Toçoğlu, M. A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9, 7701-7722.,

Onan, A. (2022). Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification. Journal of King Saud University-Computer and Information Sciences, 34(5), 2098-2117.

Osman, A. H., & Aljahdali, H. M. A. (2020). An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access, 8, 39165-39174.

Phua, E. J., & Batcha, N. K. (2020). Comparative analysis of ensemble algorithms’ prediction accuracies in education data mining. Journal of Critical Review, 7(3), 37-40.

Ponnaganti, N. D., & Anitha, R. (2022). A Novel Ensemble Bagging Classification Method for Breast Cancer Classification Using Machine Learning Techniques. Traitement du Signal, 39(1), 229-237.

Rajamohana, S. P., Umamaheswari, K., Karunya, K., & Deepika, R. (2020). Analysis of classification algorithms for breast cancer prediction. In Data Management, Analytics and Innovation (pp. 517-528). Springer, Singapore,

Sailusha, R., Gnaneswar, V., Ramesh, R., & Rao, G. R. (2020, May). Credit card fraud detection using machine learning. In 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 1264-1270). IEEE,

Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. E. (2012). Breast cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and Information Technolog, 1(1), 36-43.

Saputra, R. H., & Prasetyo, B. (2020). Improve the accuracy of c4. 5 algorithm using particle swarm optimization (pso) feature selection and bagging technique in breast cancer diagnosis. Journal of Soft Computing Exploration, 1(1), 47-55,

Saraswat, D., & Singh, P. (2020). Comparison of Different Decision Tree Algorithms for Predicting the Heart Disease. In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences (pp. 245-255). Springer, Singapore,

Sathishkumar, K., Vinodh, N., Badwe, R. A., Deo, S. V. S., Manoharan, N., Malik, R., ... & Mathur, P. (2021). Trends in breast and cervical cancer in India under National Cancer Registry Programme: an age-period-cohort analysis. Cancer Epidemiology, 74, 101982,

Seraphim, B. I., & Poovammal, E. (2021). Based Data Classification Techniques in Healthcare Using Massive Online Analysis Framework. Machine Learning and Analytics in Healthcare Systems: Principles and Applications (213). US: CRC Press.

Shastri, S., Kour, P., Kumar, S., Singh, K., Sharma, A., & Mansotra, V. (2021). A nested stacking ensemble model for predicting districts with high and low maternal mortality ratio (MMR) in India. International Journal of Information Technology, 13(2), 433-446.

Siddiqui, S. Y., Naseer, I., Khan, M. A., Mushtaq, M. F., Naqvi, R. A., Hussain, D., & Haider, A. (2021). Intelligent breast cancer prediction empowered with fusion and deep learning, Computers, Materials and Continua, 67(1), 1033-1049.

Subash Chandra Bose, S., Sivanandam, N., & Praveen Sundar, P. V. (2021). Design of ensemble classifier using Statistical Gradient and Dynamic Weight LogitBoost for malicious tumor detection. Journal of Ambient Intelligence and Humanized Computing, 12(6), 6713-6723.

Sultana, J., & Jilani, A. K. (2021). Classifying Cyberattacks Amid Covid-19 Using Support Vector Machine. In Security Incidents & Response Against Cyber Attacks (pp. 161-175). Springer, Cham,

Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 71(3), 209-249. DOI: 10.3322/caac.21660

Tekur, A., & Jain, P. (2018). A Study on Classification Algorithms for Predicting Colon Cancer using Gene Tissue Parameters. International Journal of Pure and Applied Mathematics, 119(18), 2147-2166.

Tiwari, M., Bharuka, R., Shah, P., & Lokare, R. (2020). Breast cancer prediction using deep learning and machine learning techniques. Available at SSRN 3558786. Retrieved form

Ummadi, J. R., Venkata Ramana Reddy, B., & Eswara Reddy, B. (2018). A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection. In Proceedings of International Conference on Computational Intelligence and Data Engineering (pp. 229-245). Springer, Singapore.

Vamvakas, A., Tsivaka, D., Logothetis, A., Vassiou, K., & Tsougos, I. (2022). Breast Cancer Classification on Multiparametric MRI–Increased Performance of Boosting Ensemble Methods. Technology in Cancer Research & Treatment, 21, 15330338221087828.

Vidyapith, B. (2020). Machine Learning Classifiers, Meta Classifiers Comparison And Analysis On Breast Cancer And Diabetes Datasets. Advances and Applications in Mathematical Sciences, 19(10), 1017-1028.

Vinod, A., & Manju, B. R. (2020). Optimized Prediction Model to Diagnose Breast Cancer Risk and Its Management. In Inventive Communication and Computational Technologies (pp. 503-515). Springer, Singapore.

Yadav, D. C., & Pal, S. (2019). Decision tree ensemble techniques to predict thyroid disease. International Journal of Recent Technology and Engineering, 8(3), 8242-8246. DOI: 10.35940/ijrte.C6727.098319

Zhang, W., & Zhao, L. (2020). Online decision trees with fairness. arXiv preprint arXiv:2010.08146.

Zheng, J., Lin, D., Gao, Z., Wang, S., He, M., & Fan, J. (2020). Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access, 8, 96946-96954.




How to Cite

Tina Elizabeth Mathew. (2023). Appositeness of Hoeffding tree models for breast cancer classification. Journal of Current Science and Technology, 12(3), 391–407. Retrieved from



Research Article