收录5篇最新偏差和纠偏推荐的前沿研究工作。
Title: Popularity Bias in Collaborative Filtering-Based Multimedia Recommender Systems
Published: 2022-03-01
Url: http://arxiv.org/abs/2203.00376v1
Authors: Dominik Kowald,Emanuel Lacic
多媒体推荐系统利用协同过滤等传统推荐系统的概念,向用户推荐歌曲、数字书籍和电影等媒体项目。在本文中,我们研究了这种基于协同过滤的多媒体推荐系统的一个潜在问题,即导致推荐列表中不流行项的代表性不足的流行度偏差。因此,我们研究了四个多媒体数据集,即LastFm、MovieLens、BookCrossing和MyAnimeList,我们将每个数据集分为三个不同的用户组,即LowPop、MedPop和HighPop。使用这些用户组,我们评估了四种基于协同过滤的算法在项目和用户级别上的流行程度偏差。我们的发现有三个方面:首先,我们表明,对流行项目不感兴趣的用户往往拥有大量的用户配置文件,因此是多媒体推荐系统的重要数据源。其次,我们发现流行的项目比不流行的项目更容易被推荐。第三,我们发现,对流行项目兴趣不大的用户收到的推荐明显低于对流行感兴趣中等或高的用户。
Multimedia recommender systems suggest media items, e.g., songs, (digital)books and movies, to users by utilizing concepts of traditional recommendersystems such as collaborative filtering. In this paper, we investigate apotential issue of such collaborative-filtering based multimedia recommendersystems, namely popularity bias that leads to the underrepresentation ofunpopular items in the recommendation lists. Therefore, we study fourmultimedia datasets, i.e., LastFm, MovieLens, BookCrossing and MyAnimeList,that we each split into three user groups differing in their inclination topopularity, i.e., LowPop, MedPop and HighPop. Using these user groups, weevaluate four collaborative filtering-based algorithms with respect topopularity bias on the item and the user level. Our findings are three-fold:firstly, we show that users with little interest into popular items tend tohave large user profiles and thus, are important data sources for multimediarecommender systems. Secondly, we find that popular items are recommended morefrequently than unpopular ones. Thirdly, we find that users with littleinterest into popular items receive significantly worse recommendations thanusers with medium or high interest into popularity.
Title: The Unfairness of Popularity Bias in Book Recommendation
Published: 2022-02-27
Url: http://arxiv.org/abs/2202.13446v1
Authors: Mohammadmehdi Naghiaei,Hossein A. Rahmani,Mahdi Dehghan
最近的研究表明,推荐系统通常流行度偏差。流行度偏差指的是,流行的项目(即经常评级的项目)经常被推荐,而流行程度较低的项目很少被推荐或根本不被推荐。研究人员采用了两种方法来检验流行程度偏差:(i)从用户的角度,通过分析推荐系统在接收流行项目时偏离用户期望的程度,以及(ii)通过分析长尾项目所受到的曝光量,通过整体目录覆盖率和新颖性来衡量。在本文中,我们研究了图书领域的第一个观点,尽管这个发现也可以应用于其他领域。为此,我们分析了广为人知的图书交叉数据集,并根据用户对流行项目的倾向(即小众、多样化、以畅销书为重点)定义了三个用户组。此外,我们还从准确度(如NDCG、准确度、召回率)和流行度偏差角度评估了九种最先进的推荐算法和两条基线(即随机、MostPop)的性能。我们的研究结果表明,大多数最先进的推荐算法在图书领域都存在流行度偏差,尽管有更大的配置文件大小,但无法满足用户对利基和多样化偏好的期望。相反地,畅销书focusedusers更容易收到高质量的推荐,无论是在公平性还是个性化方面。此外,我们的研究还表明,对于属于多样化和畅销书群体的用户,推荐算法中的个性化和人气偏好的不公平性之间存在权衡,即,具有高个性化能力的算法会遭受人气偏好的不公平性。
Recent studies have shown that recommendation systems commonly suffer frompopularity bias. Popularity bias refers to the problem that popular items(i.e., frequently rated items) are recommended frequently while less popularitems are recommended rarely or not at all. Researchers adopted two approachesto examining popularity bias: (i) from the users' perspective, by analyzing howfar a recommendation system deviates from user's expectations in receivingpopular items, and (ii) by analyzing the amount of exposure that long-tailitems receive, measured by overall catalog coverage and novelty. In this paper,we examine the first point of view in the book domain, although the findingsmay be applied to other domains as well. To this end, we analyze the well-knownBook-Crossing dataset and define three user groups based on their tendencytowards popular items (i.e., Niche, Diverse, Bestseller-focused). Further, weevaluate the performance of nine state-of-the-art recommendation algorithms andtwo baselines (i.e., Random, MostPop) from both the accuracy (e.g., NDCG,Precision, Recall) and popularity bias perspectives. Our results indicate thatmost state-of-the-art recommendation algorithms suffer from popularity bias inthe book domain, and fail to meet users' expectations with Niche and Diversetastes despite having a larger profile size. Conversely, Bestseller-focusedusers are more likely to receive high-quality recommendations, both in terms offairness and personalization. Furthermore, our study shows a tradeoff betweenpersonalization and unfairness of popularity bias in recommendation algorithmsfor users belonging to the Diverse and Bestseller groups, that is, algorithmswith high capability of personalization suffer from the unfairness ofpopularity bias.
Title: The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation
Published: 2022-02-27
Url: http://arxiv.org/abs/2202.13307v1
Authors: Hossein A. Rahmani,Yashar Deldjoo,Ali Tourani,Mohammadmehdi Naghiaei
兴趣点(POI)推荐系统为用户提供个性化推荐,帮助企业吸引潜在客户。尽管取得了成功,但最近的研究表明,高度数据驱动的推荐可能会受到数据偏见的影响,导致不同利益相关者,主要是消费者(用户)和提供者(项目)的不公平结果。在推荐系统中,大多数现有的与公平性相关的研究工作都单独处理用户公平性和项目公平性问题,而忽略了推荐系统在双边市场中的工作。本文从三个角度研究了(i)活跃用户的不公平性,(ii)流行项目的不公平性,以及(iii)推荐的准确性(个性化)之间的相互作用。我们将用户分为有利和不利级别,根据他们的活动级别来衡量用户公平性。为了项目公平性,我们将项目分为短头、中尾和长尾组,并研究这些项目组在top-k用户推荐列表中的曝光情况。在两个公开的POI推荐数据集Gowalla和Yelp上,对常用于POI推荐的八种不同推荐模型(例如,Context,CF)进行了实验验证,结果表明,大多数性能良好的模型都严重受到流行度偏差(providerunfairness)的不公平影响。此外,我们的研究表明,大多数推荐模型不能同时满足消费者和生产者的公平性,这表明这些变量之间的权衡可能是由于数据中的自然偏差。我们选择POI推荐作为我们的测试场景;然而,这些见解应该可以在其他领域进行扩展。
Point-of-Interest (POI) recommender systems provide personalizedrecommendations to users and help businesses attract potential customers.Despite their success, recent studies suggest that highly data-drivenrecommendations could be impacted by data biases, resulting in unfair outcomesfor different stakeholders, mainly consumers (users) and providers (items).Most existing fairness-related research works in recommender systems treat userfairness and item fairness issues individually, disregarding that RS work in atwo-sided marketplace. This paper studies the interplay between (i) theunfairness of active users, (ii) the unfairness of popular items, and (iii) theaccuracy (personalization) of recommendation as three angles of our studytriangle. We group users into advantaged and disadvantaged levels to measureuser fairness based on their activity level. For item fairness, we divide itemsinto short-head, mid-tail, and long-tail groups and study the exposure of theseitem groups into the top-k recommendation list of users. Experimentalvalidation of eight different recommendation models commonly used for POIrecommendation (e.g., contextual, CF) on two publicly available POIrecommendation datasets, Gowalla and Yelp, indicate that most well-performingmodels suffer seriously from the unfairness of popularity bias (providerunfairness). Furthermore, our study shows that most recommendation modelscannot satisfy both consumer and producer fairness, indicating a trade-offbetween these variables possibly due to natural biases in data. We choose thePOI recommendation as our test scenario; however, the insights should betrivially extendable on other domains.
Title: Assessing Gender Bias in Particle Physics and Social Science Recommendations for Academic Jobs
Published: 2022-02-17
Url: http://arxiv.org/abs/2111.09774v4
Authors: R. H. Bernstein,M. W. Macy,C. J. Cameron,S. Williams-Ceci,W. M. Williams,S. J. Ceci
我们调查了推荐信中的性别偏见,认为这可能是实验粒子物理(EPP)中女性代表性不足的一个原因,在EPP中,大约15%的教师是女性——远低于心理学和社会学中60%的水平。我们分析了EPP和这些社会科学中的2206个字母,使用了标准词汇测量以及两个新的测量:作者身份和开放式性别语言搜索。与之前的研究相比,女性并没有被描述为更具群体性、更少代理性或更不突出。词汇测量显示两个学科几乎没有性别差异。这项开放性研究揭示了社会科学领域女性和EPP领域男性之间的差异。然而,女性EPP候选人在近三倍于男性的信中被描述为“杰出”。
We investigated gender bias in letters of recommendation as a possible causeof the under-representation of women in Experimental Particle Physics (EPP),where about 15% of faculty are female -- well below the 60% level in psychologyand sociology. We analyzed 2,206 letters in EPP and these social sciences usingstandard lexical measures as well as two new measures: author status and anopen-ended search for gendered language. In contrast to former studies, womenwere not depicted as more communal, less agentic, or less standout. Lexicalmeasures revealed few gender differences in either discipline. The open-endedanalysis revealed disparities favoring women in social science and men in EPP.However, female EPP candidates were characterized as "brilliant" in nearlythree times as many letters as men.
Title: Unintended Bias in Language Model-driven Conversational Recommendation
Published: 2022-01-19
Url: http://arxiv.org/abs/2201.06224v2
Authors: Tianshu Shen,Jiaru Li,Mohamed Reda Bouadjenek,Zheda Mai,Scott Sanner
会话推荐系统(CRS)最近开始对诸如BERT这样的预训练语言模型(LM)进行评价,因为它们能够从语义上解释各种偏好语句的变化。然而,众所周知,预训练的LMs在其训练数据中容易产生固有的偏差,这可能会因用于微调LMs的特定领域语言数据(例如,用户评论)中嵌入的偏差而加剧。我们研究了最近引入的LM驱动的CRS推荐主干(称为LMRec),以调查意外偏差。,不应影响推荐的语言差异,如姓名参考或性取向或地点的间接指标,在餐厅推荐的价格和类别分布上表现出显著变化。我们观察到的令人震惊的结果强烈表明,LMRec已经学会通过其建议强化有害的刻板印象。例如,随意提及与黑人社区相关的名字会显著降低推荐餐厅的价格分布,而随意提及与男性相关的普通名字则会导致推荐酒类服务场所的增加。这项工作中呈现的这些和许多相关结果发出了一个危险信号,即LM驱动的NCRS语言处理能力的提高不会带来与缓解未来部署的CRS助理中的意外偏见相关的重大挑战,这些助理可能会覆盖数亿最终用户。
Conversational Recommendation Systems (CRSs) have recently started toleverage pretrained language models (LM) such as BERT for their ability tosemantically interpret a wide range of preference statement variations.However, pretrained LMs are well-known to be prone to intrinsic biases in theirtraining data, which may be exacerbated by biases embedded in domain-specificlanguage data(e.g., user reviews) used to fine-tune LMs for CRSs. We study arecently introduced LM-driven recommendation backbone (termed LMRec) of a CRSto investigate how unintended bias i.e., language variations such as namereferences or indirect indicators of sexual orientation or location that shouldnot affect recommendations manifests in significantly shifted price andcategory distributions of restaurant recommendations. The alarming results weobserve strongly indicate that LMRec has learned to reinforce harmfulstereotypes through its recommendations. For example, offhand mention of namesassociated with the black community significantly lowers the price distributionof recommended restaurants, while offhand mentions of common male-associatednames lead to an increase in recommended alcohol-serving establishments. Theseand many related results presented in this work raise a red flag that advancesin the language handling capability of LM-drivenCRSs do not come withoutsignificant challenges related to mitigating unintended bias in future deployedCRS assistants with a potential reach of hundreds of millions of end-users.