We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method's scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work.
翻译:我们提出了一种可扩展的贝叶西亚偏好学习方法,以共同预测个人的偏好,以及人群对相近标签的共识。人们的意见往往大相径庭,因此很难根据少量个人数据预测他们的偏好。个人偏见也使得在每件物品标签少的情况下很难推断人群的共识。我们通过将矩阵因子化与高萨进程相结合来应对这些挑战,采用巴伊西亚方法来计算噪音和稀少的数据所产生的不确定性。我们的方法利用了输入功能,如文本嵌入和用户元数据等,来预测非培训集中的新项目和用户的偏好。由于以前基于高斯进程的解决办法不向大量用户、项目或对称标签的规模,我们提出了一种限制计算和记忆成本的随机变化推论方法。我们在一项建议任务上的实验表明,我们的方法与先前的方法是竞争性的,尽管我们可以缩放的近似性。我们用数千个用户和项目来显示在自然语言处理任务上的可缩缩放性,并展示了我们可公开进行这项工作的软件状况。