首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于统计阈值的鲁棒性语音识别
引用本文:李银国,蒲甫安,郑方.基于统计阈值的鲁棒性语音识别[J].重庆邮电大学学报(自然科学版),2012,24(2):127-132.
作者姓名:李银国  蒲甫安  郑方
作者单位:1. 重庆邮电大学汽车电子与嵌入式研究中心,重庆,400065
2. 重庆邮电大学汽车电子与嵌入式研究中心,重庆400065;清华大学语音与语言研究中心,北京100084
3. 清华大学语音与语言研究中心,北京,100084
摘    要:近几十年来,语音识别系统已由实验室环境走向真实的世界中.在不同的环境噪声下,识别性能却仍不尽人意,尤其是在低信噪比的环境中.为解决在低信噪比情况下的低识别率的问题,以声学参数MFCC( Mel-frequen-cy cepstrum coefficient)为基础,提出了一种基于统计阈值的倒谱均值方差归一化算法,该算法...

关 键 词:鲁棒性  特征提取  均值减  均值方差归一(MVN)  梅尔频率倒谱系数(MFCC)  统计阈值  语音识别
收稿时间:2012/4/13 0:00:00

Statistical thresholding for robust ASR
LI Yin-guo,PU Fu-an,Thomas Fang ZHENG.Statistical thresholding for robust ASR[J].Journal of Chongqing University of Posts and Telecommunications,2012,24(2):127-132.
Authors:LI Yin-guo  PU Fu-an  Thomas Fang ZHENG
Institution:Automotive Electronics and Embedded Systems Engineering R&D Center, Chongqing University of Posts and Telecommunications, Chongqing 400065, P.R.China
Abstract:Speech recognition systems have been applied in real world applications for several decades, where there should be an unsatisfactory recognition performance under various noise conditions, particularly in lower signal-to-noise ratio (SNR) circumstances. In this paper, we propose a statistical thresholding method for mean and variance normalization technique, further reducing the mismatch between training and testing environments, which makes an automatic speech recognition system more robust to environmental changes. Mel-frequency cepstrum coefficient(MFCC) features are extracted as acoustic features, and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization (CMVN) features. The proposed statistical thresholding method is then applied. The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels. In an isolated word recognition task, the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end, and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction (CMS) and CMVN.
Keywords:robust  feature extraction  mean subtraction  mean and variant normalization  Mel-frequency cepstrum coefficient(MFCC)  statistical thresholding  speech recognition
本文献已被 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号