Studies show that developers' answers to the mobile app users' feedbacks on app stores can increase the apps' star rating. To help app developers generate answers that are related to the users' issues, recent studies develop models to generate the answers automatically. Aims: The app response generation models use deep neural networks and require training data. Pre-Trained neural language Models (PTM) used in Natural Language Processing (NLP) take advantage of the information they learned from a large corpora in an unsupervised manner, and can reduce the amount of required training data. In this paper, we evaluate PTMs to generate replies to the mobile app user feedbacks. Method: We train a Transformer model from scratch and fine-tune two PTMs to evaluate the generated responses, which are compared to RRGEN, a current app response model. We also evaluate the models with different portions of the training data. Results: The results on a large dataset evaluated by automatic metrics show that PTMs obtain lower scores than the baselines. However, our human evaluation confirms that PTMs can generate more relevant and meaningful responses to the posted feedbacks. Moreover, the performance of PTMs has less drop compared to other models when the amount of training data is reduced to 1/3. Conclusion: PTMs are useful in generating responses to app reviews and are more robust models to the amount of training data provided. However, the prediction time is 19X than RRGEN. This study can provide new avenues for research in adapting the PTMs for analyzing mobile app user feedbacks. Index Terms-mobile app user feedback analysis, neural pre-trained language models, automatic answer generation
翻译:研究表明,开发者对移动应用程序用户对应用程序仓库的反馈的答案可以增加应用程序的星级评级。 为了帮助应用程序开发者生成与用户问题相关的答案, 最近的研究开发了自动生成答案的模型。 目标: 应用程序响应生成模型使用深神经网络, 需要培训数据。 用于自然语言处理( NLP) 的先进神经语言模型(PTM) 利用他们从大型公司以不受监督的方式获得的信息, 并可以减少所需培训数据的数量。 在本文中, 我们评估了PTM 以生成与移动应用程序用户反馈有关的答复。 方法: 我们从抓起和微调两个PTM 来培训变换模型, 来评价生成的响应, 与当前应用程序响应模型RRGEN相比。 我们还利用培训数据的不同部分对模型进行评估。 结果: 由自动计量评估的大型数据集显示, PTM 获得比基线低的评分。 但是, 我们的人类评估证实, PTM 能够生成更相关和有意义的对移动用户反馈的答复。 数据分析的绩效比分析要低。