Abstract:
In order to solve the problems that traditional keyphrase algorithms tend to ignore the semantic information of the document, and the important statistical features of words are not fully utilized in deep learning methods, a keyphrase extraction algorithm based on global and local feature representation is proposed. Firstly, the transformer and convolutional neural network models are used to build a deep learning based keyphrase extraction framework in which the global semantic feature representation of the word is calculated through the multi-head attention mechanism, and the feature vector representation of words is obtained by concatenating and fusing the two statistical feature information of the part-of-speech and word frequency of each word with semantic features. Secondly, a multi-layer convolutional neural network is used to fuse the dilated convolutional neural network to efficiently capture word local feature information and inter-word dependencies. Finally, the keyphrase extraction is regarded as a sequence labeling task to extract the final keyphrases. Through multiple parameter tuning and comparison experiments on two public corpora, it is proved that the algorithm is better than the existing mainstream keyword extraction algorithms. The F1 values on the Inspec and kp20k datasets reach 49.87% and 35.77%, respectively, which effectively improves the accuracy of our automatic keyphrase extraction results.