Abstract:
Aiming at the problem that under low signal-to-noise ratio conditions, the generative adversarial network speech enhancement algorithm is difficult to capture the time-domain distribution information of the noisy speech, which leads to the speech signal being flooded by the noise, which in turn affects the enhancement effect of the model, and may produce the distortion of the speech after enhancement, a new generative adversarial network speech enhancement algorithm based on the dual generator and frequency domain discriminator is proposed. First, the algorithm employs two generators with the same parameters to improve speech quality through a multi-stage enhancement mapping. Then, each generator model adds a self-attention layer to the original model to improve the model performance and enhancement effect. Finally, the discriminator model adopts a frequency domain structure to use the distribution information on the frequency domain as the basis for judging the similarity between enhanced speech and clean speech. The experimental results show that the proposed method exhibits better enhancement effect than the comparison method in the speech enhancement task in low SNR environment, and the evaluation indexes are significantly improved.