首页 | 本学科首页   官方微博 | 高级检索  
     

基于时序卷积生成对抗网络的单通道音域分离
引用本文:郁文虎,全海燕. 基于时序卷积生成对抗网络的单通道音域分离[J]. 云南大学学报(自然科学版), 2023, 45(1): 48-56. DOI: 10.7540/j.ynu.20220110
作者姓名:郁文虎  全海燕
作者单位:昆明理工大学 信息工程与自动化学院,云南 昆明 650500
摘    要:由于音域信号的语音和音乐常常以混叠的形式出现,因此在许多应用中,希望能有效分离音域信号中的语音和音乐. 普通的分离方法一般采用基于频域信号的处理方式,而频域信号还原时需借助相位信息,导致还原的信息有偏差. 针对时域单通道音域信号分离效果差的问题,提出在对抗生成网络中引入联合训练与时序卷积的方法. 首先,对时域语音进行预处理;然后,将预处理过的数据送入时序卷积生成对抗网络生成器中进行分离;最后,将分离的干扰语音和纯净的干扰语音送到生成对抗网络判别器判别,并把判别结果反馈给生成器. 实验采用MIR-1K和data_thchs30数据集进行算法性能测试,结果表明,提出的单通道音域分离模型的PESQ和STOI指标平均提高了0.31和0.07,证明所提算法有效提升了音域信号中语音和音乐的分离效果.

关 键 词:时序卷积   联合训练   生成对抗网络   音域分离
收稿时间:2022-03-29

Speech music separation method based on joint training and timing convolution to generate confrontation network
YU Wen-hu,QUAN Hai-yan. Speech music separation method based on joint training and timing convolution to generate confrontation network[J]. Journal of Yunnan University(Natural Sciences), 2023, 45(1): 48-56. DOI: 10.7540/j.ynu.20220110
Authors:YU Wen-hu  QUAN Hai-yan
Affiliation:Faculty of Information Engineering & Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
Abstract:Because the voice and music of the range signal often appear in the form of aliasing, it is hoped to effectively separate the voice and music in the range signal in many applications. However, the common separation method generally adopts the processing method based on frequency domain signal, and the frequency domain signal restoration needs the help of phase information, resulting in the deviation of the restored speech information. Therefore, a joint training and temporal convolution approach is proposed to introduce in the adversarial generative network for the problem of of poor separation effect of time domain single channel tone domain signal separation. Firstly, the time domain speech is preprocessed. Then, the preprocessed data is sent to the time series convolutional generative adversarial network generator for separation. Finally, the separated interference speech and pure interference speech are sent to the generative adversarial network discriminator for discrimination, and feed the discriminant results back to the generator. The experiment adopts MIR-1K and data_ thchs30 dataset for algorithm performance test. The experimental results show that the PESQ and STOI indexes of the single channel range separation model proposed in this paper are improved by 0.31 and 0.07 , which proves that the proposed algorithm effectively improves the separation effect of speech and music in the range signal.
Keywords:time domain convolution    joint training    generation countermeasure network    range separation   
点击此处可从《云南大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《云南大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号