MilkQA:供选择答案任务的消费者问题数据集 (MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection)

We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making MilkQA publicly available. We study the behavior of four answer selection models on MilkQA: two baseline models and two convolutional neural network archictetures. Our results show that MilkQA poses real challenges to computational models, particularly due to linguistic characteristics of its questions and to their unusually longer lengths. Only one of the experimented models gives reasonable results, at the cost of high computational requirements.

翻译：我们引入了MilkQA, 这个问题解答了专门研究消费者问题的乳制品领域的数据集。该数据集包含2,657对问题和答案,这些问答是以葡萄牙语撰写的,最初由巴西农业研究公司(Embrapa)收集。所有问题都是由真实情况驱动的,由具有不同背景和识字水平的数千名作者撰写,而答案则由Embrapa客户服务的专家编写。我们的数据集由3名人类告示员过滤和匿名。消费者问题是一个具有挑战性的问题,通常被用作一种寻求信息的形式。尽管有几个问题回答数据集,但大多数这类资源不适合对消费者问题答案选择模式的研究。我们的目标是通过公布MilkQA来填补这一空白。我们研究了四个答案选择模型在MilkQA上的行为:两个基线模型和两个革命神经网络的古老。我们的结果显示,MilkQA对计算模型提出了真正的挑战,特别是由于其问题的语言特征及其异常长的长度。只有一个实验模型在高的成本上给出了合理的计算结果。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/