首页 | 本学科首页   官方微博 | 高级检索  
     检索      

混响声场中语音识别方法研究
引用本文:栗学丽,徐柏龄.混响声场中语音识别方法研究[J].南京大学学报(自然科学版),2003,39(4):525-531.
作者姓名:栗学丽  徐柏龄
作者单位:南京大学声学所,近代声学国家重点实验室,南京,210093
基金项目:国家自然科学基金(60272037)
摘    要:免提式话筒语音识别系统是语音识别走向实用的目标之一.实现这一系统,首先要解决房间效应引起的混响问题.通过讨论室内混响声场中语音的特点,提出用鲁棒性特征参数——滤波规整的Mel频率倒谱参数(FNMFCC,即MFCC参数在对数功率谱域进行低通滤波,倒谱域进行均值减,并用标准差加权进行非线性规整,采用这3种措施来消除混响引起的语音参数的变化.识别方法用矢量量化法,用4组无混响数码语音进行训练,对特定人无混响和4种混响声场中共150组数码音的平均识别率达到98.7%.提出的这一新方法在不降低无混响音识别率的情况下,提高了混响声场的语音识别率.该方法不仅识别率高,而且运算量小、所需内存空间小。易于做成小型实用的快速识别系统.

关 键 词:语音识别  混响声场  房间效应  滤波规整  Mel频率倒谱参数  矢量量化

Speech Recognition Methods in Reverberant Environments
Li Xue-Li,Xu Bo-Ling.Speech Recognition Methods in Reverberant Environments[J].Journal of Nanjing University: Nat Sci Ed,2003,39(4):525-531.
Authors:Li Xue-Li  Xu Bo-Ling
Abstract:The hands-free speech communication is one of the practical aims of the development of the new speech recognition. Robustness to reverberation in automatic speech recognition system is a key point for this application. In this paper the characteristics of the reverberant speech are discussed and a new robust feature -Filter-Normalized Mel Frequency Cepstrum Coefficient (FNMFCC) is proposed. Reverberation changes the magnitude, phase, and formants of the speech. It also obscures the weak intensity part, thus reducing the speech intelligibility. The reverberant speech can be represented as the convolution of the clean speech and the room impulse response (RIR) in the time domain. In the log spectral domain the speech varies slowly while reverberation causes the rapid ripples. So a lowpass filter can be used to reduce the ripples partly in the log power spectrum. The reverberant speech can also be represented approximately as the sum of cepstrum of the clean speech and RIR in the cepstral domain. Although RIR is time-variant, it can be considered to be constant during the time that an isolated word is spoken. So cepstral mean subtraction (CMS) can get rid of the reverberant effect further as a linear highpass filter. Moreover, nonlinear normalization by dividing standard deviation as an automatic gain control is beneficial to improving the recognition rate. MFCC provides an alternative representation for the speech spectrum that incorporates some aspects of audition so that it has better performance than traditional coefficients and lower cost than some auditory coefficients. In a word, the FNMFCC feature can be obtained by lowpass filtering MFCC in log power spectrum, subtracting the mean in the cepstral domain, and nonlinear normalization to robust the reverberation. Vector quantization is used for fast recognizing computation. The codebook size is set to 64 and LEG algorithm is applied on the 4 groups FNMFCC data of the clean speech to generate the codebook. The average recognition rate is 98.7% for speaker-dependent digital speech (150 groups) in the clean environment and four reverberant environments. Experiments prove that the new method is efficient in the reverberant speech recognition and FNMFCC is better than other feature vectors. This system reduces the computational complexity without losing the performance and it is easy to implement a practical application such as the voice dialing system.
Keywords:reverberant environment  speech recognition  Filter-Normalized Mel Frequency Cepstrum Coefficient  vector quantization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号