Businesses and customers can gain valuable information from product reviews. The sheer number of reviews often necessitates ranking them based on their potential helpfulness. However, only a few reviews ever receive any helpfulness votes on online marketplaces. Sorting all reviews based on the few existing votes can cause helpful reviews to go unnoticed because of the limited attention span of readers. The problem of review helpfulness prediction is even more important for higher review volumes, and newly written reviews or launched products. In this work we compare the use of RoBERTa and XLM-R language models to predict the helpfulness of online product reviews. The contributions of our work in relation to literature include extensively investigating the efficacy of state-of-the-art language models -- both monolingual and multilingual -- against a robust baseline, taking ranking metrics into account when assessing these approaches, and assessing multilingual models for the first time. We employ the Amazon review dataset for our experiments. According to our study on several product categories, multilingual and monolingual pre-trained language models outperform the baseline that utilizes random forest with handcrafted features as much as 23% in RMSE. Pre-trained language models reduce the need for complex text feature engineering. However, our results suggest that pre-trained multilingual models may not be used for fine-tuning only one language. We assess the performance of language models with and without additional features. Our results show that including additional features like product rating by the reviewer can further help the predictive methods.
翻译:商业和客户可以从产品审查中获得宝贵的信息。 数量众多的审查往往要求根据潜在帮助性对其进行排序。 然而,只有少数审查在网上市场得到过任何帮助性投票。 根据少数现有票数对所有审查进行分类,可能导致有益的审查被忽略,因为读者的注意力范围有限。 审评的帮助性预测问题对于更高审查卷以及新撰写的审查或推出的产品更为重要。 在这项工作中,我们比较了RoBERTA和XLM-R语言模型的使用,以预测在线产品审查的帮助性。我们在文献方面的工作贡献包括广泛调查最新语言模式(单语和多语种)的功效,以稳健基线为基础,在评估这些方法时考虑到等级,并首次评估多语种模式。我们用亚马逊审查数据集来做实验。 根据我们对若干产品类别的研究,多语种和语言前语言模型比利用随机森林和手工艺特征的基线要快得多。 我们的语文模型在RIPERE中的贡献是23 %。 预先培训的语言模型可以降低我们使用的一种语言的精确性评估结果。