基于GAN的中文虚假评论数据集生成方法

Chinese fake review dataset generation method based on adversarial generation network

  • 摘要: 针对互联网虚假评论大肆横行,在虚假评论研究领域却没有完全公开的中文数据集可供中文虚假评论检测研究的问题,提出了一种基于生成对抗网络的中文虚假评论数据生成模型. 首先,对生成器生成的文字序列通过蒙特卡洛搜索获取一批样本;然后,采用强化学习方法将判别器、分类器和重构器的反馈化为奖励分数;最后,传回生成器,对生成器进行参数优化,以生成贴近真实世界的具有相应类标签属性及特征的虚假评论数据. 以BLEU值为评估指标,实验结果表明,所提出的模型在本文数据集上取得了更好的BLEU值,具有较好的生成效果.

     

    Abstract: In order to solve the problem that fake reviews are rampant on the Internet, but there is no fully open Chinese data set for Chinese fake reviews detection in the field of fake reviews research, a Chinese fake reviews data generation model based on generative adversarial network is proposed. Firstly, Monte Carlo search is used to obtain a batch of samples from the text sequence generated by the generator. Then, the feedback of discriminator, classifier and reconstructor is converted into reward scores by reinforcement learning. Finally, reward scores back to the generator, and the parameters of the generator are optimized to generate fake review data with corresponding class tag attributes and features close to the real world. The BLEU value is used as the evaluation index. Experimental results show that on the dataset of this paper, the proposed generative model achieves better BLEU values and achieves a high level of performance.

     

/

返回文章
返回