基于REMOS的远距离语音识别模型补偿方法 REMOS-based method for model domain compensation in remote speech recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于REMOS的远距离语音识别模型补偿方法

引用本文：	杨勇,李劲松,孙明伟.基于REMOS的远距离语音识别模型补偿方法[J].重庆邮电大学学报(自然科学版),2014,26(1):117-123.

作者姓名：	杨勇李劲松孙明伟

作者单位：	重庆邮电大学计算机科学与技术研究所，重庆 400065;重庆邮电大学计算机科学与技术研究所，重庆 400065;重庆邮电大学计算机科学与技术研究所，重庆 400065

基金项目：	重庆市自然科学基金（CSTC 2007BB2445）；重庆市教委科学技术研究项目（KJ110522）；重庆邮电大学科研基金（A2009-26）

摘要：	封闭环境中远距离语音识别会受到混响效果的影响，从而降低语音识别率。混响建模(reverberation modeling for speech recognition，REMOS)是一种在模型域进行混响补偿的新方法，该方法在已知声源位置的情况下能有效提升远距离语音识别精度。但在实际应用中，往往难以预测声源的位置。利用最大后验概率的原理，基于对房间不同区域进行有区别补偿的思想，在按帧的隐马尔可夫模型 (hidden Markov model，HMM)补偿的基础上，提出一种在封闭环境中新的模型补偿方法。该方法利用K均值聚类K-means算法对房间冲击响应 (room impulse response，RIR)的优化集进行聚类，对所属相同类的混响模型进行合并处理，再把合并后的混响模型载入维特比算法中，对清晰语音的HMM模型进行按帧补偿。最后采用后验概率方法选择最佳补偿，使得模型域的混响补偿能最接近精确补偿。实验证明，该方法能进一步提升远距离语音识别的精度。
关键词：	混响混响建模（REMOS） K-means 房间冲击响应模型补偿
收稿时间：	2012/11/6 0:00:00
修稿时间：	2013/12/26 0:00:00
REMOS-based method for model domain compensation in remote speech recognition

YANG Yong,LI Jinsong and SUN Mingwei.REMOS-based method for model domain compensation in remote speech recognition[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(1):117-123.

Authors:	YANG Yong LI Jinsong and SUN Mingwei

Abstract:	The distant-talking speech recognition would be affected by reverb in a enclosed environment. As a result, the recognition rate would be greatly reduced. Reverberation modeling for speech recognition(REMOS) is a new method for reverberate compensation in the model domain; it can improve distant-talking speech recognition accuracy effectively if the sound source location is already known. But in a real application, location of sound source can be hardly to predicted. Based on the principle of maximum a posteriori probability and frame-wise hidden Markov model(HMM) model compensation, a new method for model compensation in a enclosed environment is proposed in this paper. In this method, K-means clustering algorithm is used to cluster room impulse response(RIR) optimized sets, and merge the reverberation model which is in a same kind class, then Viterbi decoding algorithm is loaded, and frame-wise compensation is implemented to the clear speech HMM model. At last, the best compensate model is selected through the maximum a posteriori estimation. It makes model domain reverberate compensation to be closest to the accurate compensation. The experimental results prove that the method can enhance distant-talking speech recognition accuracy further.

Keywords:

	点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏