SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Abstract
Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.

Citation
Wang, J., Li, Z., Yang, Q., Qu, J., Chen, Z., Liu, Q., & Hu, G. (2021). SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary. Proceedings of the 30th ACM International Conference on Information & Knowledge Management. doi:10.1145/3459637.3482188

Acknowledgements
This research is supported by National Key R&D Program of China (No. 2018-AAA0101900), the Priority Academic Program Development of Jiangsu Higher Education Institutions, National Natural Science Foundation of China (Grant No. 62072323, 61632016), Natural Science Foundation of Jiangsu Province (No. BK20191420), Suda-Toycloud Data Intelligence Joint Laboratory, and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Publisher
ACM

Conference/Event Name
30th ACM International Conference on Information and Knowledge Management, CIKM 2021

DOI
10.1145/3459637.3482188

arXiv
2110.05750

Additional Links
https://dl.acm.org/doi/10.1145/3459637.3482188

Permanent link to this record

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2022-03-30 09:49:41
Conference Paper
2021-10-21 13:09:40
* Selected version