Multi-document summarization is challenging because the summaries should not only describe the most important information from all documents but also provide a coherent interpretation of the documents. This paper proposes a method for multi-document summarization based on cluster similarity. In the extractive method we use hybrid model based on a modified version of the PageRank algorithm and a text correlation considerations mechanism. After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both extractive and abstractive approaches were considered in this study. The proposed method achieves competitive results in VLSP 2022 competition.
翻译:多文档摘要是一项具有挑战性的任务,因为摘要既应该描述所有文档中最重要的信息,还应该提供一个连贯的文档解释。本文提出了一种基于聚类相似性的多文档摘要方法。在抽取式摘要方法中,我们使用了基于PageRank算法的修改版本和文本相关性考虑机制的混合模型。在从每个聚类中选择最重要的句子生成摘要后,我们应用了BARTpho和ViT5构建了生成式模型。本研究考虑了抽取和生成两种方法。该方法在VLSP 2022比赛中取得了有竞争力的结果。