首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于WGAN的语音增强算法研究
引用本文:王怡斐,韩俊刚,樊良辉.基于WGAN的语音增强算法研究[J].重庆邮电大学学报(自然科学版),2019,31(1):136-142.
作者姓名:王怡斐  韩俊刚  樊良辉
作者单位:西安邮电大学,西安,710121;西安邮电大学,西安,710121;西安邮电大学,西安,710121
基金项目:国家自然科学基金重点资助项目(61136002)
摘    要:带噪语音可看成由独立的噪声信号和语音信号经某种方式混合而成,传统语音增强方法需要对噪声信号和干净语音信号的独立性和特征分布做出假设,不合理的假设会造成噪声残留、语音失真等问题,导致语音增强效果不佳。此外,噪声本身的随机性和突变性也会影响传统语音增强方法的鲁棒性。针对这些问题,使用生成对抗网络来对语音进行增强,给出一种基于Wasserstein 距离的生成对抗网络(Wasserstein generative adversarial nets, WGAN)的语音增强方法来加快训练速度和稳定训练过程。该方法无需人工提取声学特征,且使语音增强系统的泛化能力得以提升,在匹配噪声集和不匹配噪声集中都有良好的增强效果。实验结果表明,使用训练出的端对端语音增强模型后,语音信号的客观评价标准(perceptual evaluation of speech quality,PESQ)平均得到23.97%的提高。

关 键 词:语音增强  生成对抗网络  卷积神经网络  深度学习
收稿时间:2017/12/4 0:00:00
修稿时间:2018/11/2 0:00:00

Algorithm research of speech enhancement based on WGAN
WANG Yifei,HAN Jungang and FAN Lianghui.Algorithm research of speech enhancement based on WGAN[J].Journal of Chongqing University of Posts and Telecommunications,2019,31(1):136-142.
Authors:WANG Yifei  HAN Jungang and FAN Lianghui
Abstract:Noisy speech can be seen as a combination of an independent noise signal and a speech signal in some way. Traditional speech enhancement techniques need to make assumptions of the independence and feature distribution of noisy and clean speech signals. Unreasonable assumptions may cause problems such as residue noise and speech distortion, resulting in poor speech enhancement. In addition, the randomness and mutability of noise itself also affect the robustness of traditional speech enhancement methods. To solve these problems, this paper uses the generative adversarial network to enhance the speech, and gives a speech enhancement method based on the WGAN to accelerate the training speed and stabilize the training process. The method does not need to manually extract acoustic features, and it improves generalization capability of the speech enhancement system. There is a good enhancement effect in both the matched noise set and the unmatched noise set. The experimental results show that the PESQ is increased by an average of 23.97% based on this end to end speech enhancement training model.
Keywords:speech enhancement  generative adversarial nets  convolution neural network  deep learning
本文献已被 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号