首页 | 本学科首页   官方微博 | 高级检索  
     检索      

卷积神经网络声学模型的结构优化和加速计算
引用本文:王智超,徐及,张鹏远,颜永红.卷积神经网络声学模型的结构优化和加速计算[J].重庆邮电大学学报(自然科学版),2018,30(3):416-422.
作者姓名:王智超  徐及  张鹏远  颜永红
作者单位:中国科学院 语言声学与内容理解重点实验室,北京,100190 中国科学院 语言声学与内容理解重点实验室,北京100190;中国科学院 新疆理化技术研究所新疆民族语音语言信息处理实验室,乌鲁木齐830011
基金项目:国家自然科学基金(11461141004;11590774),中国科学院战略性先导科技专项(面向感知中国的新一代信息技术研究XDA06030100;XDA06040603),国家"863"计划(2015AA016306),国家"973"计划( 2013CB329302) The National Natural Science Foundation of China(11461141004;XDA06040603),The National "863" Program(2015AA016306),The National "973" Program(2013CB329302)
摘    要:将卷积神经网络(convolutional neural networks,CNN)声学模型应用于中文大词表连续电话语音识别任务中,分析了卷积层数、滤波器参数等变量对CNN模型性能的影响,最终在中文电话语音识别测试中,CNN模型相比传统的全连接神经网络模型取得了识别字错误率1.2%的下降.由于卷积结构的复杂性,常规的神经网络加速方法如定点量化和SSE指令加速等方法对卷积运算的加速效率较低.针对这种情况,对卷积结构进行了优化,提出了2种卷积矢量化方法:权值矩阵矢量化和输入矩阵矢量化对卷积运算进行改善.结果表明,输入矩阵矢量化方法的加速效率更高,结合激活函数后移的策略,使得卷积运算速度提升了8.9倍.

关 键 词:语音识别  声学模型  卷积神经网络  矢量化  automatic  speech  recognition  acoustic  models  convolutional  neural  networks  vectorization
收稿时间:2016/11/14 0:00:00
修稿时间:2017/5/18 0:00:00

Structure optimization and computing acceleration for convolutional neural network acoustic models
WANG Zhichao,XU Ji,ZHANG Pengyuan and YAN Yonghong.Structure optimization and computing acceleration for convolutional neural network acoustic models[J].Journal of Chongqing University of Posts and Telecommunications,2018,30(3):416-422.
Authors:WANG Zhichao  XU Ji  ZHANG Pengyuan and YAN Yonghong
Abstract:In this paper, we apply convolutional neural network acoustic models to a continuous telephone speech recogni-tion task with large vocabulary in chinses. We studied the influences of the number of convolution layers and filter parame-ters on the model performance. Finally, in the Chinese telephone speech recognition test, the CNN model obtained an abso-lute 1.2% decrease in terms of word error rate compared with the traditional fully connected neural network model. Due to the complexity of the convolution structure, conventional neural network acceleration methods such as fixed-point quantiza-tion and SSE instruction have lower acceleration efficiency for convolution operations. In this case, we have optimized the convolution structure and proposed two kinds of convolution vectorization method:weight matrix vectorization and input ma-trix vectorization to accelerate the convolution operation. Results show that the input matrix vetorization method is more effi-cient and an 8.9 times acceleration with no performance degradation is obtained together with the activations retroposition strategy.
Keywords:automatic speech recognition  acoustic models  convolutional neural networks  vectorization
本文献已被 万方数据 等数据库收录!
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号