Products in an ecommerce catalog contain information-rich fields like description and bullets that can be useful to extract entities (attributes) using NER based systems. However, these fields are often verbose and contain lot of information that is not relevant from a search perspective. Treating each sentence within these fields equally can lead to poor full text match and introduce problems in extracting attributes to develop ontologies, semantic search etc. To address this issue, we describe two methods based on extractive summarization with reinforcement learning by leveraging information in product titles and search click through logs to rank sentences from bullets, description, etc. Finally, we compare the accuracy of these two models.
翻译:电子商务目录中的产品含有信息丰富的领域,如描述和子弹等,可用于利用NER系统抽取实体(属性),然而,这些领域往往杂乱无章,包含许多从搜索角度来说无关的信息。将这些领域中的每个句子同等地处理,可能导致全文不匹配,并在提取属性以发展本体学、语义搜索等方面造成问题。为了解决这一问题,我们描述了两种基于采掘归纳的方法,通过利用产品标题的信息和点击日志搜索,从子弹、描述等中排列句子。最后,我们比较了这两种模式的准确性。