SIGIR是一个展示信息检索领域中各种新技术和新成果的重要国际论坛。

VIP内容

主题: Learning Term Discrimination

摘要: 文档索引是有效信息检索(IR)的关键组件。经过诸如词干和停用词删除之类的预处理步骤之后,文档索引通常会存储term-frequencies(tf)。与tf(仅反映一个术语在文档中的重要性)一起,传统的IR模型使用诸如反文档频率(idf)之类的术语区分值(TDV)在检索过程中偏向于区分性术语。在这项工作中,我们建议使用浅层神经网络学习TDV,以进行文档索引,该浅层神经网络可以近似TF-IDF和BM25等传统的IR排名功能。我们的建议在nDCG和召回方面均优于传统方法,即使很少有带有正标签的查询文档对作为学习数据。我们学到的TDV用于过滤区分度为零的词汇,不仅可以显着降低倒排索引的内存占用量,而且可以加快检索过程(BM25的速度提高了3倍),而不会降低检索质量。

成为VIP会员查看完整内容
0
11

最新内容

The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.

0
0
下载
预览

最新论文

The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.

0
0
下载
预览
Top