噪声环境下多特征融合的语音端点检测方法

罗思洋; 龙华; 邵玉斌; 杜庆治

doi:10.7540/j.ynu.20200444

噪声环境下多特征融合的语音端点检测方法

Voice activity detection method based on multi-feature fusion in noise environment

摘要

摘要: 针对传统语音端点检测方法在噪声环境下鲁棒性较差以及对语音段检测效果不佳的问题，提出一种多特征融合的语音端点检测方法. 首先，提取带噪语音信号的子带谱熵特征和基于Mel频率倒谱系数(Mel Frequency Cepstral Coefficient, MFCC)的投影特征，并将Gammatone频率倒谱系数的第一维系数GFCC₀特征应用到语音端点检测任务中；然后，对3类特征进行自适应加权融合，得到适用于端点检测的融合特征；最后，采用模糊C均值聚类自适应估计门限阈值，再通过双门限法得到端点检测的结果. 所提方法和已有传统方法相比，在7种噪声环境下均取得了更好的端点检测结果，提升了语音端点检测的准确率，特别是在volvo噪声环境下的端点检测准确率可以达到94.5%以上.

Abstract: Aiming at the problems of poor robustness of traditional speech activity detection methods in noisy environment and the performance of speech segment detection is not good, a speech activity detection method based on multi-feature fusion is proposed. Firstly, the band-partitioning spectral entropy and projection feature based on Mel Frequency Cepstral Coefficient(MFCC) of the speech signal with noise are extracted, and the GFCC₀ feature are applied to speech activity detection tasks. Then, the fusion features suitable for speech activity detection are obtained by adaptive weighted fusion of the three types of features. Finally, the threshold value of fusion features is estimated adaptively based on fuzzy C-means clustering and the speech activity detection results are obtained by double threshold method. Compared with the existing traditional methods, the proposed method in this paper achieves better speech activity detection results in seven noise environments, and improves the accuracy of speech activity detection. Especially in volvo noise environment, the accuracy can reach more than 94.5%.

HTML全文

参考文献(16)

施引文献

资源附件(0)