Abstract:
Aiming at the poor performance of the existing methods in language identification in the low signal-to-noise ratio environment, a language identification method is proposed, which integrates the cochlear filter coefficients and the spectral parameters of the vocal tract impulse response. This method characterizes human vocalization characteristics and human hearing characteristics. Firstly, the cochlear filter coefficients that simulate the auditory characteristics of the human ear are fused. Then the spectral parameters of the vocal tract impulse response that characterize the characteristics of human vocalization are extracted. Finally, the Gaussian mixture general background model is used to test the proposed method in language identification. The experimental results show that in the four signal-to-noise ratio environments, this method is superior to other comparison methods. Compared with the logarithmic Mel-scale filter energy feature based on deep learning, the identification accuracy is improved by 16.1%, which is also very good compared to other methods.