Comparative Study on Automated Reference Summary Generation using BERT Models and ROUGE Score Assessment

Nattapong Sanchan

doi:10.59796/jcst.V14N2.2024.26

Authors

Nattapong Sanchan School of Information Technology and Innovation, Bangkok University, Pathum Thani 12120, Thailand

DOI:

https://doi.org/10.59796/jcst.V14N2.2024.26

Keywords:

automatic summarization, document clustering, k-means, centroid-based summarization, natural language processing, text mining

Abstract

Automatic text summarization is a sub-area in text mining in which a computer system determines the most informative information in the original text to produce a summary for certain jobs and users. In the development of the systems, one of the most important tasks is to evaluate the quality of summaries produced by the systems. Generally, the evaluation task becomes laborious, time-consuming, and expensive because it requires significant efforts on annotation tasks for humans to manually create reference summaries. Being able to generate automatic reference summaries would promote the development of summarization systems in term of speed and evaluation. In this paper, we proposed an Auto-Ref Summary Generation framework for automatically generating reference summaries used in the generic text summarization evaluation task, the Sliced Summary. Given a set of clusters from a cluster ground-truth label dataset, variants of BERT models were utilized for creating cluster representations. The automatic reference summaries were later generated through a centroid-based summarization approach. Overall, DistilBERT, ROBERTa, and SBERT have played crucial roles in automatic summary generation, achieving the highest ROUGE-1 score of 0.47060. However, this does not meet our expectation on text coherence and readability aspects. Although the summaries generated through our proposed framework could not be used as the replacement of the manual summaries, this study has shed new light on the acquisition of automatic reference summaries from a ground-truth label dataset.

References

Abacha, A. B., M’rabet, Y., Zhang, Y., Shivade, C., Langlotz, C., & Demner-Fushman, D. (2021, June). Overview of the MEDIQA 2021 Shared Task on Summarization in The Medical Domain [Conference Presentation]. In Proceedings of the 20th Workshop on Biomedical Language Processing. https://doi.org/10.18653/v1/2021.bionlp-1.8

Altmami, N. I., & Menai, M. E. B. (2022). Automatic summarization of scientific articles: A survey. Journal of King Saud University-Computer and Information Sciences, 34(4), 1011-1028. https://doi.org/10.1016/j.jksuci.2020.04.020

Baumel, T., Cohen, R., & Elhadad, M. (2016, February 12-17). Topic Concentration in Query Focused Summarization Datasets [Conference Presentation]. The Thirtieth Conference on Artificial Intelligence (AAAI-16), Phoenix, Arizona, US. https://doi.org/10.1609/aaai.v30i1.10323

Chen, H., Zhang, H., Guo, H., Yi, S., Chen, B., & Zhou, X. (2023, September 18). SALAS: Supervised Aspect Learning Improves Abstractive Multi-document Summarization Through Aspect Information Loss [Conference Presentation]. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy. https://doi.org/10.1007/978-3-031-43421-1_4

Delort, J. Y., & Alfonseca, E. (2012, April 23-27). DualSum: A Topic-Model Based Approach For Update Summarization [Conference Presentation]. 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. https://aclanthology.org/E12-1022.pdf

Han, J., & Kamber, M. (2006). Data mining concepts and techniques (2ed.). USA: Morgan Kaufmann.

Huang, X., Wan, X., & Xiao, J. (2011, June 19-24). Comparative news summarization using linear programming [Conference Presentation]. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Oregon, USA. https://link.springer.com/chapter/10.1007/978-981-16-9012-9_21

Kenton, J. D. M. W. C., & Toutanova, L. K. (2019, June 2-7). Bert: Pre-training of deep bidirectional transformers for language understanding [Conference Presentation]. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, Minnesota, USA. https://aclanthology.org/N19-1423.pdf

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020, February 9). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations [Conference Presentation]. The International Conference on Learning Representations, Addis Ababa, Ethiopia. https://doi.org/10.48550/arXiv.1909.11942

Lin, C. Y. (2004). Rouge: A Package for Automatic Evaluation of Summaries [Conference Presentation]. In Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain. https://aclanthology.org/W04-1013/

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. & Stoyanov, V. (2019a). RoBERTa: A Robustly Optimized BERT Pretraining Approach [Conference Presentation]. The International Conference on Learning Representations, Online.

Liu, Y., Titov, I., & Lapata, M. (2019b). Single Document Summarization As Tree Induction [Conference Presentation]. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, USA. https://doi.org/10.18653/v1/n19-1173

Louis, A., & Nenkova, A. (2009, August). Automatically evaluating content selection in summarization without human models [Conference Presentation]. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore. https://doi.org/10.3115/1699510.1699550

Minel, J. L., Nugier, S., & Piat, G. (1997). How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and Their Results on SERAPHIN [Conference Presentation]. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the 35th Meeting of the Association for Computational Linguistics, and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.

Monsen, J., & Rennes, E. (2022, June 20-25). Perceived text quality and readability in extractive and abstractive summaries [Conference Presentation]. 13th Conference on Language Resources and Evaluation (LREC 2022), Marseille, France.

Nenkova, A., & McKeown, K. (2011). Automatic summarization. Foundations and Trends in Information Retrieval, 5(2–3), 103-233. https://doi.org/10.1561/9781601984715

Oya, T., Mehdad, Y., Carenini, G., & Ng, R. (2014, June 19-21). A Template-Based Abstractive Meeting Summarization: Leveraging Summary and Source Text Relationships [Conference Presentation]. Proceedings of the 8th International Natural Language Generation Conference (INLG), Pennsylvania, U.S.A. https://doi.org/10.3115/v1/w14-4407

Radev, D. R., Jing, H., Styś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938. https://doi.org/10.3115/1117575.1117578

Rossiello, G., Basile, P., & Semeraro, G. (2017, April 3). Centroid-based text summarization through compositionality of word embeddings [Conference Presentation]. Proceedings of the multiling 2017 workshop on summarization and summary evaluation across source types and genres, Valencia, Spain. https://doi.org/10.18653/v1/w17-1003

Sanchan, N., Aker, A., & Bontcheva, K. (2017). Automatic summarization of online debates [Conference Presentation]. In Proceedings of the 1st Workshop on Natural Language Processing and Information Retrieval associated with RANLP 2017, Varna, Bulgaria. https://doi.org/10.26615/978-954-452-038-0_003

Saggion, H., & Gaizauskas, R. (2004). Multi-document summarization by cluster/profile relevance and redundancy removal [Conference Presentation]. In Proceedings of the Document Understanding Conference.

Saggion, H., Torres-Moreno, J. M., da Cunha, I., SanJuan, E., & Velázquez-Morales, P. (2010). Multilingual summarization evaluation without human models [Conference Presentation]. In Coling 2010: Posters, Beijing, China.

Sanchan, N., Aker, A., & Bontcheva, K. (2018, April 17-23). Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data [Conference Presentation]. Computational Linguistics and Intelligent Text Processing: 18th International Conference, CICLing 2017, Budapest, Hungary. https://doi.org/10.1007/978-3-319-77116-8_37

Sanchan, N. Bontcheva, K. & Aker, A. (2020, October 21-22). An Adoption of a Contradiction Detection Task to Assist the Summarization of Online Debates [Conference Presentation]. The 5th International Conference on Information Technology (InCIT), Chonburi, Thailand. https://doi.org/10.1109/incit50588.2020.9310941

Sharevski, F., Jachim, P., & Pieroni, E. (2021, November 15). Regulation TL; DR: Adversarial Text Summarization of Federal Register Articles [Conference Presentation]. In Proceedings of the 3rd Workshop on Cyber-Security Arms Race, New York, USA. https://dl.acm.org/doi/abs/10.1145/3474374.3486917

Subakti, A., Murfi, H., & Hariadi, N. (2022). The performance of BERT as data representation of text clustering. Journal of big Data, 9(1), 1-21. https://doi.org/10.1186/s40537-022-00564-9

Tang, L., Sun, Z., Idnay, B., Nestor, J. G., Soroush, A., Elias, P. A., ... & Peng, Y. (2023). Evaluating large language models on medical evidence summarization. npj Digital Medicine, 6(1), 158. https://doi.org/10.1101/2023.04.22.23288967

U.S. Commerce Department, National Institute of Standards and Technology (2014, September 9). DUC 2004 Past Data. https://www-nlpir.nist.gov/projects/duc/data/2004_data.html

Wan, X. (2011, June 19-24). Using bilingual information for cross-language document summarization [Conference Presentation]. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Oregon, USA. https://dl.acm.org/doi/10.5555/2002472.2002659

Wu, H., Gu, Y., Sun, S., & Gu, X. (2016, July 24-29). Aspect-Based Opinion Summarization with Convolutional Neural Networks [Conference Presentation]. In 2016 International Joint Conference on Neural Networks, Vancouver, BC, Canada. https://doi.org/10.1109/ijcnn.2016.7727602

Yu, N., Huang, M., Shi, Y., & Zhu, X. (2016, December 11-17). Product Review Summarization By Exploiting Phrase Properties [Conference Presentation]. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan. https://aclanthology.org/C16-1106

Stastistic (Updated June 2025)	%
Submissions Accepted	19
Submissions Declined (After Review)	19
Submissions Declined (Desk reject)	62
Day to first decision (days)	5
Day to Acceptance (days)	105

Comparative Study on Automated Reference Summary Generation using BERT Models and ROUGE Score Assessment

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Make a Submission

Indexed in

Scimago Journal Rank

Statics

new stat

CC

Facebook