周大春, 邵玉斌, 张昊阁, 龙华, 彭艺. 应用于噪声环境下语种识别的GFCC改进算法[J]. 云南大学学报(自然科学版), 2024, 46(2): 246-254. doi: 10.7540/j.ynu.20220531
引用本文: 周大春, 邵玉斌, 张昊阁, 龙华, 彭艺. 应用于噪声环境下语种识别的GFCC改进算法[J]. 云南大学学报(自然科学版), 2024, 46(2): 246-254. doi: 10.7540/j.ynu.20220531
ZHOU Dachun, SHAO Yubin, ZHANG Haoge, LONG Hua, PENG Yi. An improved GFCC algorithm for language recognition in noisy environments[J]. Journal of Yunnan University: Natural Sciences Edition, 2024, 46(2): 246-254. DOI: 10.7540/j.ynu.20220531
Citation: ZHOU Dachun, SHAO Yubin, ZHANG Haoge, LONG Hua, PENG Yi. An improved GFCC algorithm for language recognition in noisy environments[J]. Journal of Yunnan University: Natural Sciences Edition, 2024, 46(2): 246-254. DOI: 10.7540/j.ynu.20220531

应用于噪声环境下语种识别的GFCC改进算法

An improved GFCC algorithm for language recognition in noisy environments

  • 摘要: 不同的噪声在频谱上有不同的特点,使得自动语种识别的性能在噪声环境下显著下降. 针对该问题,提出一种基于改进时域伽马通滤波器倒谱系数(gammatone filter cepstral coefficient, GFCC)特征的语种识别方法. 首先,提取不同噪声背景下的训练集的时域GFCC特征;然后,利用Fisher比计算特征各维对区分语种的相对贡献度大小,分析不同噪声对时域GFCC特征各维的影响,并根据分析来设计合适的权值对特征各维加权,得到语种区分性更强的特征集;最后,利用高斯混合−通用背景模型作为基线系统进行语种识别,以测试所提方法性能. 实验结果表明,在单一噪声背景,信噪比为−5 dB,噪声源分别为粉红噪声、餐厅噪声的条件下,所提方法相比于传统时域GFCC特征方法的识别率分别提升了40.1、20.6个百分点,在其他噪声背景、信噪比下的识别率也有一定程度的提升.

     

    Abstract: Different noises have different characteristics in the frequency spectrum, which makes the performance of automatic language identification significantly degraded in the noisy environment. To address this problem, a language identification method based on improved time-domain gammatone filter cepstral coefficient (GFCC) features is proposed. First, the time-domain GFCC features are extracted from the training set with different noise backgrounds. Then, the Fisher ratio is used to calculate the relative contribution of each dimension of the features to distinguish languages, to analyse the effect of different noises on each dimension of the time-domain GFCC features, and to design suitable weights to weight each dimension of the features based on the analysis to obtain a feature set with better language discriminatory properties. Finally, a Gaussian mixture model-universal background model is used as the baseline system for language identification to test the performance of the proposed method. The experimental results show that under the conditions of single noise background, signal-to-noise ratio of −5 dB, and noise sources of pink noise and restaurant noise respectively, the identification rate of the proposed method is improved by 40.1 percentage points and 20.6 percentage points respectively compared with the traditional time-domain GFCC feature method, and the identification rate under other noise background and signal-to-noise ratio is also improved to some extent.

     

/

返回文章
返回