Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity.
翻译:在线审查已成为购买服务(产品)的重要信息来源。 意见垃圾邮件会操纵审查,影响对服务的总体认识。 发现意见垃圾邮件的一个关键挑战是获取地面真相。 尽管在网上存在大量审查,但其中只有极少数是贴有标签的垃圾邮件或非垃圾邮件。 在本文中,我们提议建立SAMGAN,这是一个依赖有限的标签数据和未贴标签的数据进行意见垃圾邮件检测的遗传对抗网络。 SAMGAN改进了基于最先进的GAN文本分类技术。 TripAdvisor数据集实验显示,在使用有限的标签数据时,SPAMGAN超越了现有的垃圾邮件检测技术。 除了检测垃圾邮件审查之外,SAMGAN还可以产生合理的复杂审查。