首页 | 本学科首页   官方微博 | 高级检索  
     检索      

中英双语混合语音识别研究
引用本文:张晴晴,潘接林,颜永红.中英双语混合语音识别研究[J].重庆邮电大学学报(自然科学版),2008,20(4):391-396.
作者姓名:张晴晴  潘接林  颜永红
作者单位:中国科学院声学研究所中科信利实验室,北京,100080;中国科学院声学研究所中科信利实验室,北京,100080;中国科学院声学研究所中科信利实验室,北京,100080
基金项目:国家高技术研究发展计划 , 国家重点基础研究发展规划项目计划 , 国家自然科学基金资助
摘    要:介绍了针对歌曲检索中出现的中英混合现象所开发的中英双语识别系统。在双语混合语音识别中,主要面临的2个问题:①在保证双语识别率的前提下控制系统的复杂度;②有效处理插入语中原用语引起的非母语口音现象。为了解决双语混合现象以及减少统计建模所需的数据量,通过音素混合聚类方法建立起一个统一的双语识别系统。在聚类算法中,提出了一种新型基于混淆矩阵的两遍音素聚类算法(TCM),并将该方法与基于声学似然度准则的聚类方法进行了比较。实验结果表明:利用TCM进行音素聚类的识别性能优于基于声学似然度音素聚类的性能,最终得到的中英双语识别系统在纯英文测试集上的短语错误率(PER)相对基线单英文识别系统下降7.19%;在双语混合测试集上PER相对基线混合模型下降13.78%;同时在纯中文测试集上保持了基线单中文识别系统的性能。

关 键 词:双语识别  聚类算法  自适应
收稿时间:2008/3/17 0:00:00

Development of a Mandarin English bilingual speech recognition system
ZHANG Qing-qing,PAN Jie-lin,YAN Yong-hong.Development of a Mandarin English bilingual speech recognition system[J].Journal of Chongqing University of Posts and Telecommunications,2008,20(4):391-396.
Authors:ZHANG Qing-qing  PAN Jie-lin  YAN Yong-hong
Institution:ThinkIT Speech Laboratory, Institute of Acoustics of Chinese Academy of Sciences, Beijing 100080, P.R.China
Abstract:The Mandarin English bilingual speech recognition system which has been developed for the Mandarin English phenomenon in song retrieval is introduced. The main difficulties to handle the bilingual speech recognition for real world application are focused on two aspects: the first is to balance the performance on inter and intra-sentential language switching and to reduce the complexity of the bilingual speech recognition system; the second is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, instead of using two separate monolingual models for each language, a compact single set of bilingual acoustic model derived by phone set merging and clustering is developed. Hence, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log likelihood measure method. Experiments testify that TCM can achieve better performance. The phrase error rate (PER) of MESRS for English utterances was reduced by 7.19% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 13.78% relative PER reduction.
Keywords:bilingual speech recognition  clustering algorithm  adaptation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号