Neural ranking models (NRMs) have become one of the most important techniques in information retrieval (IR). Due to the limitation of relevance labels, the training of NRMs heavily relies on negative sampling over unlabeled data. In general machine learning scenarios, it has shown that training with hard negatives (i.e., samples that are close to positives) could lead to better performance. Surprisingly, we find opposite results from our empirical studies in IR. When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse. Based on our investigation, the superficial reason is that there are more false negatives (i.e., unlabeled positives) in the top-ranked results with a stronger retriever, which may hurt the training process; The root is the existence of pooling bias in the dataset constructing process, where annotators only judge and label very few samples selected by some basic retrievers. Therefore, in principle, we can formulate the false negative issue in training NRMs as learning from labeled datasets with pooling bias. To solve this problem, we propose a novel Coupled Estimation Technique (CET) that learns both a relevance model and a selection model simultaneously to correct the pooling bias for training NRMs. Empirical results on three retrieval benchmarks show that NRMs trained with our technique can achieve significant gains on ranking effectiveness against other baseline strategies.
翻译:神经等级模型(NRMS)已成为信息检索中最重要的技术之一。 由于相关标签的限制,NRMS的培训严重依赖于对未贴标签数据进行负面抽样。 在一般的机器学习情景中,它表明使用硬负(即接近正数的样本)的培训可以带来更好的业绩。 令人惊讶的是,我们在IR的实验研究中发现了相反的结果。 当将头等结果(不包括贴标签的正数)作为较强的检索器的负数时,所学的NRM的性能就变得更差了。 根据我们的调查,表面上的原因是,在上等成绩中,与较强的检索器相比,有更假的负数(即未贴标签的正数)更多。 这可能会损害培训过程; 其根源在于将数据集构建过程中的偏差集中在一起,在这个过程中,只有一些基本检索者所选取的样本很少。 因此,在原则上,我们可以在培训NRMMMS时提出假的负数问题,作为从标签的基线模型中学习,从经过培训的NRMRM的精度模型中取得显著的精准性成果。 。 解决这个问题,我们用创新的精选的精选的精选方法来学习新的精选的精选方法的精选的精选。