Abstract:
Vietnamese is a low-resource language and high-quality keyphrase news corpus is scarce. In order to solve the problem that the accuracy of generating Vietnamese news keyphrases is not high under the condition of insufficient samples, a multi-feature fusion Vietnamese keyphrase generation model is proposed to improve the relevance of the generated Vietnamese keyphrases and Vietnamese news documents. Firstly, the features of Vietnamese news entity, part of speech, vocabulary position are spliced with the word vector, so that the word vector of the input model contains more dimensional semantic information. Secondly, the bidirectional attention mechanism is used to capture the dependence of context and news headlines and enhance the guiding role of headlines in keyphrase generation. Finally, it combine the copy mechanism to generate Vietnamese keyphrases for improving the semantic relevance of keyphrases. Experiments on the constructed Vietnamese news corpus show that the keyphrase generation model fused with multiple features can generate high-quality keyphrases under the condition of limited Vietnamese training corpus. Compared with TG-Net, the F1@10 and
R@50 score are improved by 13.2% and 17.1% respectively.