李公瑾, 邵玉斌, 杜庆治, 龙华, 马迪南. 基于胶囊网络的恶意评论检测[J]. 云南大学学报(自然科学版). doi: 10.7540/j.ynu.20230023
引用本文: 李公瑾, 邵玉斌, 杜庆治, 龙华, 马迪南. 基于胶囊网络的恶意评论检测[J]. 云南大学学报(自然科学版). doi: 10.7540/j.ynu.20230023
LI Gong-jin, SHAO Yu-bin, DU Qing-zhi, LONG Hua, MA Di-nan. Toxic comments detection based on capsule network[J]. Journal of Yunnan University: Natural Sciences Edition. DOI: 10.7540/j.ynu.20230023
Citation: LI Gong-jin, SHAO Yu-bin, DU Qing-zhi, LONG Hua, MA Di-nan. Toxic comments detection based on capsule network[J]. Journal of Yunnan University: Natural Sciences Edition. DOI: 10.7540/j.ynu.20230023

基于胶囊网络的恶意评论检测

Toxic comments detection based on capsule network

  • 摘要: 针对传统恶意评论检测模型无法适应不断更新的网络文化和语言习惯以及神经网络丢失信息的问题,提出了一种基于胶囊网络的检测模型. 首先,采用BERT模型提取词向量的特征,以保留文本的潜在语义信息; 然后,通过胶囊网络在局部范围内提取特征表示,并结合Bi-LSTM在全局范围内提取特征表示,以获得更全面的特征表示;其次,利用注意力机制将局部和全局的特征表示进行融合,提取关键信息并降低特征表示的维度; 最后,使用Sigmoid分类器对结果进行分类,并输出检测结果. 实验结果表明,所提出的组合模型相对于传统模型能够提取到更精细的语义信息,有效地提高了分类效果,在恶意评论的检测任务中,准确度达到了0.922.

     

    Abstract: A capsule network-based detection model is proposed for the problem that traditional toxic comments detection models cannot adapt to the constantly updated online culture and language habits as well as the loss of information in neural networks. Firstly, the BERT model is used to extract the features of word vectors to retain the potential semantic information of the text. Then the feature representation is extracted in the local range by the capsule network and combined with Bi-LSTM in the global range to obtain a more comprehensive feature representation. The attention mechanism is then used to fuse the local and global feature representations to extract key information and reduce the dimensionality of the feature representation. Finally, the results are classified using the Sigmoid classifier and the detection results are output. The experimental results show that the proposed combined model is able to extract finer semantic information relative to the traditional model, effectively improving the classification effect and achieving an accuracy of 0.922 in the detection task of toxic comments.

     

/

返回文章
返回