Abstract:
Aiming at the problem that the language recognition performance is greatly reduced due to the short duration of short speech and the difference between the duration of training speech and the duration of test speech, a multi-language recognition model of short broadcast speech with variable duration is proposed. Firstly, the duration of different speech length is structured. Then the features of the structured short speech were extracted and the logarithmic power spectrum envelope was extracted as language features. Finally, the language features are input into the residual neural network for classification. The experimental results show that compared with the traditional feature input, the logarithmic power spectrum envelope feature improves the language recognition accuracy of short-time speech to 82.4%. Compared with the language recognition model without the introduction of the time-regular layer, VD-LID improved the accuracy of language recognition by 27.9% and 37.7% respectively in the experiments of 5 s and 10 s speech duration..