结合池化技术和特征组的汉语语义角色标注

朱傲; 万福成; 马宁; 车郭怡

doi:10.7540/j.ynu.20200642

结合池化技术和特征组的汉语语义角色标注

Chinese Semantic Role Labeling combined with pooling technology and feature groups

摘要

摘要: 利用基于统计机器学习方法进行汉语语义角色标注（Semantic Role Labeling，SRL）存在人工抽取特征工作繁琐低效、模型难以捕捉长句上下文语义信息等问题. 对此，提出BiLSTM-MaxPool-CRF融合模型进行汉语SRL，同时进行模型性能优化研究. 首先，在训练语料中融入词性、论元标记、短语句法等多层级语言学特征；然后，结合AvgPool技术对特征组进行采样选取；最后，通过多组实验结果表明，相比于未采样提取的多特征组，经过池化技术采样提取的多特征能够显著提高模型的性能.

Abstract: There are problems in Chinese Semantic Role Tagging (SRL) based on statistical machine learning methods. For example, manual feature extraction is cumbersome and inefficient, and the model is difficult to capture the contextual semantic information of long sentences. Regarding the issue above, this paper proposes a BiLSTM-MaxPool-CRF fusion model for Chinese SRL, and conducts model performance optimization research. Firstly, multi-level linguistic features such as part of speech, argument markers, short syntax are incorporated into the training corpus. Then, the average pooling technology is used to sample and select multiple feature vector groups to eliminate feature redundant information. Finally, the results of multiple sets of experiments show that compared to the multi-level features extracted without sampling, the multi-features extracted through the average pooling technique can significantly improve the performance of the sequence annotation model.

HTML全文

参考文献(14)

施引文献

资源附件(0)