自动文摘(又称自动文档摘要)是指通过自动分析给定的一篇文档或多篇文档,提炼、总结其中的要点信息,最终输出一篇长度较短、可读性良好的摘要(通常包含几句话或数百字),该摘要中的句子可直接出自原文,也可重新撰写所得。简言之,文摘的目的是通过对原文本进行压缩、提炼,为用户提供简明扼要的文字描述。用户可以通过阅读简 短的摘要而知晓原文中所表达的主要内容,从而大幅节省阅读时间。

自动文摘 ( Automatic summarization ) 专知荟萃

入门学习

  1. 自动文摘系列(1-13)
    [http://rsarxiv.github.io/tags/seq2seq/]
  2. Text summarization with TensorFlow Google官方发布
    [https://research.googleblog.com/2016/08/text-summarization-with-tensorflow.html]
  3. Your tl;dr by an ai: a deep reinforced model for abstractive summarization 强化学习用于文档摘要
    [https://einstein.ai/research/your-tldr-by-an-ai-a-deep-reinforced-model-for-abstractive-summarization]
  4. 教机器学习摘要
    [https://zhuanlan.zhihu.com/p/21426100?refer=paperweekly]

综述

  1. Nenkova A, McKeown K. Automatic summarization[M]. Now Publishers Inc, 2011.
    https://www.cis.upenn.edu/~nenkova/1500000015-Nenkova.pdf
    https://aclanthology.org/P11-5003.pdf ,86页ppt
  2. Text Summarization Techniques: A Brief Survey
    [https://arxiv.org/pdf/1707.02268.pdf]
  3. A SURVEY OF TEXT SUMMARIZATION TECHNIQUES
    [http://altaplana.com/ibm-luhn58-LiteratureAbstracts.pdf]
  4. Recent automatic text summarization techniques: a survey
    [https://link.springer.com/article/10.1007/s10462-016-9475-9]
  5. 近70年文本自动摘要研究综述 刘家益 邹益民
    [http://210.76.106.46/qk/90051A/201707/672573777.html]

进阶论文

1988

  1. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513– 523.
    [http://www.sciencedirect.com/science/article/pii/0306457388900210]

1999

  1. Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of research and development 2, 2 (1958), 159–165. [39] Inderjeet Mani and Eric Bloedorn. 1999. Summarizing similarities and differences among related documents. Information Retrieval 1, 1-2 (1999), 35–67.
    [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5392672]

2000

  1. Einat Amitay and Cécile Paris. 2000. Automatically summarising web sites: is there a way around it?. In Proceedings of the ninth international conference on Information and knowledge management. ACM, 173–179.
    [https://dl.acm.org/citation.cfm?id=354756.354816]
  2. Dragomir R Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroidbased summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. Association for Computational Linguistics, 21– 30.
    [http://www.docin.com/p-853652484.html]

2001

  1. John M Conroy and Dianne P O’leary. 2001. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 406–407.
    [http://pdfs.semanticscholar.org/1213/3cfc6688cc2cdea57595b045a28b94d98f1d.pdf]
  2. Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19–25.
    [https://dl.acm.org/citation.cfm?doid=383952.383955]

2002

  1. Inderjeet Mani, Gary Klein, David House, Lynette Hirschman, Therese Firmin, and Beth Sundheim. 2002. SUMMAC: a text summarization evaluation. Natural Language Engineering 8, 01 (2002), 43–68.
  2. Qiaozhu Mei and ChengXiang Zhai. 2008. Generating Impact-Based Summaries for Scientific Literature.. In ACL, Vol. 8. Citeseer, 816–824.
    [https://www.researchgate.net/publication/231901086_SUMMAC_a_text_summarization_evaluation]
  3. Dragomir R Radev, Eduard Hovy, and Kathleen McKeown. 2002. Introduction to the special issue on summarization. Computational linguistics 28, 4 (2002), 399–408.
    [https://dl.acm.org/citation.cfm?id=638178.638179]

2003

  1. J-Y Delort, Bernadette Bouchon-Meunier, and Maria Rifqi. 2003. Enhanced web document summarization using hyperlinks. In Proceedings of the fourteenth ACM conference on Hypertext and hypermedia. ACM, 208–215.
    [http://dl.acm.org/citation.cfm?id=900097]
  2. Paula S Newman and John C Blitzer. 2003. Summarizing archived discussions: a beginning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 273–276.
    [https://dl.acm.org/citation.cfm?id=604097]

2004

  1. Günes Erkan and Dragomir R Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR) 22, 1 (2004), 457–479.
    [https://arxiv.org/abs/1109.2128]
  2. Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics.
    [https://digital.library.unt.edu/ark:/67531/metadc30962/]
  3. Ani Nenkova and Amit Bagga. 2004. Facilitating email thread access by extractive summary generation. Recent advances in natural language processing III: selected papers from RANLP 2003 (2004), 287.
    [https://www.researchgate.net/publication/221303547_Facilitating_email_thread_access_by_extractive_summary_generation]
  4. Dragomir R Radev, Hongyan Jing, Małgorzata Styś, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919–938.
    [http://www.sciencedirect.com/science/article/pii/S0306457303000955]
  5. Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. 2004. Summarizing email threads. In Proceedings of HLT-NAACL 2004: Short Papers. Association for Computational Linguistics, 105–108.
    [https://dl.acm.org/citation.cfm?id=1614011]
  6. TextRank: Bringing Order into Texts
    [https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf~]

2005

  1. Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 202–209.
    [https://dl.acm.org/citation.cfm?id=1076071]
  2. Rada Mihalcea and Paul Tarau. 2005. A language independent algorithm for single and multiple document summarization. (2005).
    [https://www.researchgate.net/publication/228340005_A_language_independent_algorithm_for_single_and_multiple_document_summarization]
  3. Sentence Extraction Based Single Document Summarization
    [http://oldwww.iiit.ac.in/cgi-bin/techreports/display_detail.cgi?id=IIIT/TR/2008/97]

2006

  1. Ping Chen and Rakesh Verma. 2006. A query-based medical information summarization system using ontology knowledge. In Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on. IEEE, 37–42.
    [https://dl.acm.org/citation.cfm?id=1153019]
  2. Ben Hachey, Gabriel Murray, and David Reitter. 2006. Dimensionality reduction aids term co-occurrence based multi-document summarization.In Proceedings of arXiv, July 2017, USA Allahyari, M. et al the workshop on task-focused summarization and question answering. Association for Computational Linguistics, 1–7.
    [http://www.ltg.ed.ac.uk/np/publications/ltg/papers/Hachey2006Dimensionality.pdf]
  3. Hal Daumé III and Daniel Marcu. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 305–312.
    [https://dl.acm.org/citation.cfm?id=1220214]

2007

  1. Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2007. Comments-oriented blog summarization by sentence extraction. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 901–904.
    [https://dl.acm.org/citation.cfm?id=1321571&CFID=824361189&CFTOKEN=11022411]

2008

  1. Leonhard Hennig, Winfried Umbrath, and Robert Wetzker. 2008. An ontologybased approach to text summarization. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, Vol. 3. IEEE, 291–294.
    [http://dl.acm.org/citation.cfm?id=1487345]
  2. Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2008. Comments-oriented document summarization: understanding documents with readers’ feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 291–298.
    [https://dl.acm.org/citation.cfm?id=1390385&CFID=824361189&CFTOKEN=11022411]
  3. Vahed Qazvinian and Dragomir R Radev. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 689–696.
    [https://dl.acm.org/citation.cfm?id=1599081.1599168]

2010

  1. Asli Celikyilmaz and Dilek Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 815–824.
    [https://dl.acm.org/citation.cfm?id=1858765]
  2. Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258–268.
    [http://www.learnpunjabi.org/pdf/survey-paper.pdf]
  3. Makbule Gulcin Ozsoy, Ilyas Cicekli, and Ferda Nur Alpaslan. 2010. Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 869–876.
    [https://dl.acm.org/citation.cfm?id=1873879]

2011

  1. Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, and Chingiz A Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 12 (2011), 14514–14522.
    [http://www.sciencedirect.com/science/article/pii/S0957417411008177]
  2. Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 481–490.
    [https://dl.acm.org/citation.cfm?id=2002534&amp%3bpreflayout=flat]
  3. John Hannon, Kevin McCarthy, James Lynch, and Barry Smyth. 2011. Personalized and automatic social summarization of events in video. In Proceedings of the 16th international conference on Intelligent user interfaces. ACM, 335–338.
    [https://dl.acm.org/citation.cfm?id=1943459]
  4. You Ouyang, Wenjie Li, Sujian Li, and Qin Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47, 2 (2011), 227–237.
    [http://www.sciencedirect.com/science/article/pii/S0306457310000257]

2012

  1. Elena Lloret and Manuel Palomar. 2012. Text summarisation in progress: a literature review. Artificial Intelligence Review 37, 1 (2012), 1–41.
    [https://link.springer.com/article/10.1007%2Fs10462-011-9216-z]
  2. Ani Nenkova and Kathleen McKeown. 2012. A survey of text summarization techniques. In Mining Text Data. Springer, 43–76
    [https://www.mendeley.com/research-papers/survey-text-summarization-techniques/]

2013

  1. Rasim M Alguliev, Ramiz M Aliguliyev, and Nijat R Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40, 5 (2013), 1675–1689.
    [http://www.sciencedirect.com/science/article/pii/S0957417412010688]
  2. Elena Baralis, Luca Cagliero, Saima Jabeen, Alessandro Fiori, and Sajid Shah. 2013. Multi-document summarization based on the Yago ontology. Expert Systems with Applications 40, 17 (2013), 6976–6984.
    [http://www.sciencedirect.com/science/article/pii/S0957417413004429]
  3. Freddy Chong Tat Chua and Sitaram Asur. 2013. Automatic Summarization of Events from Social Media.. In ICWSM.
    [https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6057/0]
  4. Zhaochun Ren, Shangsong Liang, Edgar Meij, and Maarten de Rijke. 2013. Personalized time-aware tweets summarization. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 513–522.
    [https://staff.fnwi.uva.nl/m.derijke/wp-content/papercite-data/pdf/ren-personalized-2013.pdf]
  5. Horacio Saggion and Thierry Poibeau. 2013. Automatic text summarization: Past, present and future. In Multi-source, Multilingual Information Extraction and Summarization. Springer, 3–21.
    [https://hal.archives-ouvertes.fr/hal-00782442/document]
  6. Beaux P Sharifi, David I Inouye, and Jugal K Kalita. 2013. Summarization of Twitter Microblogs. Comput. J. (2013), bxt109.
    [http://cs.uccs.edu/~jkalita/papers/2013/SharifiBeauxComputerJournal2013.pdf]
  7. Text summarization using Latent Semantic Analysis
    [https://www.researchgate.net/publication/220195824_Text_summarization_using_Latent_Semantic_Analysis]

2014

  1. Liu Na, Li Ming-xia, Lu Ying, Tang Xiao-jun, Wang Hai-wen, and Xiao Peng. 2014. Mixture of topic model for multi-document summarization. In Control and Decision Conference (2014 CCDC), The 26th Chinese. IEEE, 5168–5172.
    [http://ieeexplore.ieee.org/document/6853102/metrics]
  2. Vahed Qazvinian, Dragomir R Radev, Saif M Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, and Taesun Moon. 2014. Generating extractive summaries of scientific paradigms. arXiv preprint arXiv:1402.0556 (2014).
    [https://www.researchgate.net/publication/229534087_Generating_surveys_of_scientific_paradigms]
  3. Yogesh Sankarasubramaniam, Krishnan Ramanathan, and Subhankar Ghosh. 2014. Text summarization using Wikipedia. Information Processing & Management 50, 3 (2014), 443–461.
    [http://www.sciencedirect.com/science/article/pii/S0306457314000119]

2015

  1. A Neural Attention Model for Abstractive Sentence Summarization
    [https://arxiv.org/pdf/1509.00685.pdf]

2016

  1. Neural Summarization by Extracting Sentences and Words
    [https://arxiv.org/pdf/1603.07252.pdf]
  2. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
    [https://arxiv.org/pdf/1602.06023.pdf]

2017

  1. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. ArXiv e-prints (2017). arXiv:1707.02919
    [https://arxiv.org/abs/1707.02919]
  2. E. D. Trippe, J. B. Aguilar, Y. H. Yan, M. V. Nural, J. A. Brady, M. Assefi, S. Safaei, M. Allahyari, S. Pouriyeh, M. R. Galinski, J. C. Kissinger, and J. B. Gutierrez. 2017. A Vision for Health Informatics: Introducing the SKED Framework.An Extensible Architecture for Scientific Knowledge Extraction from Data. ArXiv e-prints (2017). arXiv:1706.07992
    [https://arxiv.org/abs/1706.07992]
  3. A Deep Reinforced Model for Abstractive Summarization
    [https://arxiv.org/pdf/1705.04304.pdf]

代码

  1. Sequence-to-Sequence with Attention Model for Text Summarization.
    [https://github.com/tensorflow/models/tree/master/research/textsum]
  2. gensim.summarization offers TextRank summarization
    https://radimrehurek.com/gensim/summarization/summariser.html

Tutorial

  1. 文本自动摘要:现状与未来 万小军 北京大学 2016年10月16日
    [https://pan.baidu.com/s/1nuTUrSP]
  2. Tutorial on automatic summarization
    [https://www.slideshare.net/dinel/orasan-ranlp2009]
    [https://pan.baidu.com/s/1o8bZJJk]
  3. How to Run Text Summarization with TensorFlow
    [https://hackernoon.com/how-to-run-text-summarization-with-tensorflow-d4472587602d]
  4. Text Summarization with Gensim
    [https://rare-technologies.com/text-summarization-with-gensim/]

数据集

  1. DUC 2004
    [http://www.cis.upenn.edu/~nlp/corpora/sumrepo.html]
  2. Opinosis Dataset - Topic related review sentences
    [http://kavita-ganesan.com/opinosis-opinion-dataset]
  3. 17 Timelines
    [http://kavita-ganesan.com/opinosis-opinion-dataset]
  4. Legal Case Reports Data Set
    [http://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports]

领域专家

  1. 万小军 北京大学
    [https://sites.google.com/site/wanxiaojun1979/]
  2. 秦兵 哈工大
    [https://m.weibo.cn/u/1880324342?sudaref=login.sina.com.cn&retcode=6102]
  3. 刘挺
    [http://homepage.hit.edu.cn/pages/liuting]

初步版本,水平有限,有错误或者不完善的地方,欢迎大家提建议和补充,会一直保持更新,本文为专知内容组原创内容,未经允许不得转载,如需转载请发送邮件至fangquanyi@gmail.com 或 联系微信专知小助手(Rancho_Fang)

敬请关注http://www.zhuanzhi.ai 和关注专知公众号,获取第一手AI相关知识

成为VIP会员查看完整内容
微信扫码咨询专知VIP会员