基于时序卷积生成对抗网络的单通道音域分离

郁文虎; 全海燕

doi:10.7540/j.ynu.20220110

基于时序卷积生成对抗网络的单通道音域分离

Speech music separation method based on joint training and timing convolution to generate confrontation network

摘要

摘要: 由于音域信号的语音和音乐常常以混叠的形式出现，因此在许多应用中，希望能有效分离音域信号中的语音和音乐. 普通的分离方法一般采用基于频域信号的处理方式，而频域信号还原时需借助相位信息，导致还原的信息有偏差. 针对时域单通道音域信号分离效果差的问题，提出在对抗生成网络中引入联合训练与时序卷积的方法. 首先，对时域语音进行预处理；然后，将预处理过的数据送入时序卷积生成对抗网络生成器中进行分离；最后，将分离的干扰语音和纯净的干扰语音送到生成对抗网络判别器判别，并把判别结果反馈给生成器. 实验采用MIR-1K和data_thchs30数据集进行算法性能测试，结果表明，提出的单通道音域分离模型的PESQ和STOI指标平均提高了0.31和0.07，证明所提算法有效提升了音域信号中语音和音乐的分离效果.

Abstract: Because the voice and music of the range signal often appear in the form of aliasing, it is hoped to effectively separate the voice and music in the range signal in many applications. However, the common separation method generally adopts the processing method based on frequency domain signal, and the frequency domain signal restoration needs the help of phase information, resulting in the deviation of the restored speech information. Therefore, a joint training and temporal convolution approach is proposed to introduce in the adversarial generative network for the problem of of poor separation effect of time domain single channel tone domain signal separation. Firstly, the time domain speech is preprocessed. Then, the preprocessed data is sent to the time series convolutional generative adversarial network generator for separation. Finally, the separated interference speech and pure interference speech are sent to the generative adversarial network discriminator for discrimination, and feed the discriminant results back to the generator. The experiment adopts MIR-1K and data_ thchs30 dataset for algorithm performance test. The experimental results show that the PESQ and STOI indexes of the single channel range separation model proposed in this paper are improved by 0.31 and 0.07 , which proves that the proposed algorithm effectively improves the separation effect of speech and music in the range signal.

HTML全文

参考文献(20)

施引文献

资源附件(0)