基于双生成器与频域判别器GAN语音增强算法

纪鹏威; 全海燕

doi:10.7540/j.ynu.20230308

基于双生成器与频域判别器GAN语音增强算法

Speech enhancement algorithm based on dual generator and frequency domain discriminator GAN

摘要

摘要: 针对在低信噪比条件下，生成对抗网络语音增强算法难以捕捉带噪语音的时域分布信息，导致语音信号被噪音淹没，进而影响模型的增强效果，可能产生增强后语音失真等问题，提出了一种基于双生成器与频域判别器的新型生成对抗网络语音增强算法. 首先，该算法采用了两个参数相同的生成器，通过多阶段的增强映射来改善语音质量；然后，每个生成器模型在原有模型的基础上增加了自注意力层，以提升模型性能和增强效果；最后，判别器模型采用了频域结构，以频域上的分布信息作为判断增强语音与干净语音相似度的依据. 实验结果表明，所提出的方法在低信噪比环境下的语音增强任务中相较于对比方法表现出更好的增强效果，各项评价指标得到了显著提升.

Abstract: Aiming at the problem that under low signal-to-noise ratio conditions, the generative adversarial network speech enhancement algorithm is difficult to capture the time-domain distribution information of the noisy speech, which leads to the speech signal being flooded by the noise, which in turn affects the enhancement effect of the model, and may produce the distortion of the speech after enhancement, a new generative adversarial network speech enhancement algorithm based on the dual generator and frequency domain discriminator is proposed. First, the algorithm employs two generators with the same parameters to improve speech quality through a multi-stage enhancement mapping. Then, each generator model adds a self-attention layer to the original model to improve the model performance and enhancement effect. Finally, the discriminator model adopts a frequency domain structure to use the distribution information on the frequency domain as the basis for judging the similarity between enhanced speech and clean speech. The experimental results show that the proposed method exhibits better enhancement effect than the comparison method in the speech enhancement task in low SNR environment, and the evaluation indexes are significantly improved.

HTML全文

参考文献(22)

施引文献

资源附件(0)