基于统计阈值的鲁棒性语音识别 Statistical thresholding for robust ASR期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于统计阈值的鲁棒性语音识别

引用本文：	李银国,蒲甫安,郑方.基于统计阈值的鲁棒性语音识别[J].重庆邮电大学学报(自然科学版),2012,24(2):127-132.

作者姓名：	李银国蒲甫安郑方

作者单位：	1. 重庆邮电大学汽车电子与嵌入式研究中心,重庆,400065 2. 重庆邮电大学汽车电子与嵌入式研究中心,重庆400065;清华大学语音与语言研究中心,北京100084 3. 清华大学语音与语言研究中心,北京,100084

摘要：	近几十年来,语音识别系统已由实验室环境走向真实的世界中.在不同的环境噪声下,识别性能却仍不尽人意,尤其是在低信噪比的环境中.为解决在低信噪比情况下的低识别率的问题,以声学参数MFCC( Mel-frequen-cy cepstrum coefficient)为基础,提出了一种基于统计阈值的倒谱均值方差归一化算法,该算法...
关键词：	鲁棒性特征提取均值减均值方差归一(MVN) 梅尔频率倒谱系数(MFCC) 统计阈值语音识别
收稿时间：	2012/4/13 0:00:00
Statistical thresholding for robust ASR

LI Yin-guo,PU Fu-an,Thomas Fang ZHENG.Statistical thresholding for robust ASR[J].Journal of Chongqing University of Posts and Telecommunications,2012,24(2):127-132.

Authors:	LI Yin-guo PU Fu-an Thomas Fang ZHENG

Institution:	Automotive Electronics and Embedded Systems Engineering R&D Center, Chongqing University of Posts and Telecommunications, Chongqing 400065, P.R.China

Abstract:	Speech recognition systems have been applied in real world applications for several decades, where there should be an unsatisfactory recognition performance under various noise conditions, particularly in lower signal-to-noise ratio (SNR) circumstances. In this paper, we propose a statistical thresholding method for mean and variance normalization technique, further reducing the mismatch between training and testing environments, which makes an automatic speech recognition system more robust to environmental changes. Mel-frequency cepstrum coefficient(MFCC) features are extracted as acoustic features, and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization (CMVN) features. The proposed statistical thresholding method is then applied. The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels. In an isolated word recognition task, the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end, and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction (CMS) and CMVN.

Keywords:	robust feature extraction mean subtraction mean and variant normalization Mel-frequency cepstrum coefficient(MFCC) statistical thresholding speech recognition
本文献已被万方数据等数据库收录！
	点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏