Abstract:
Different noises have different characteristics in the frequency spectrum, which makes the performance of automatic language identification significantly degraded in the noisy environment. To address this problem, a language identification method based on improved time-domain gammatone filter cepstral coefficient (GFCC) features is proposed. First, the time-domain GFCC features are extracted from the training set with different noise backgrounds. Then, the Fisher ratio is used to calculate the relative contribution of each dimension of the features to distinguish languages, to analyse the effect of different noises on each dimension of the time-domain GFCC features, and to design suitable weights to weight each dimension of the features based on the analysis to obtain a feature set with better language discriminatory properties. Finally, a Gaussian mixture model-universal background model is used as the baseline system for language identification to test the performance of the proposed method. The experimental results show that under the conditions of single noise background, signal-to-noise ratio of −5 dB, and noise sources of pink noise and restaurant noise respectively, the identification rate of the proposed method is improved by 40.1 percentage points and 20.6 percentage points respectively compared with the traditional time-domain GFCC feature method, and the identification rate under other noise background and signal-to-noise ratio is also improved to some extent.