语音识别中的DenseNet模型研究 Research on the DenseNet model for speech recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

语音识别中的DenseNet模型研究

引用本文：	刘想德,王芸秋,蒋勤,张毅,何翔鹏.语音识别中的DenseNet模型研究[J].重庆邮电大学学报(自然科学版),2022,34(4):604-611.

作者姓名：	刘想德王芸秋蒋勤张毅何翔鹏

作者单位：	重庆邮电大学先进制造工程学院, 重庆 400065;重庆邮电大学计算机科学与技术学院, 重庆 400065

基金项目：	重庆市长寿区科技计划项目(CS2020007)

摘要：	为了解决语音识别中由网络加深导致的低层特征消失、参数量大及网络训练困难的问题，基于Inception V3网络的非对称卷积思想，提出了一种改进的密集连接卷积神经网络(densely connected convolutional neural networks, DenseNet)模型。根据语音识别的长时相关性，通过密集连接块建立起不同层之间的连接关系，从而保存低层特征、加强特征传播;为了得到尺度更丰富的声学特征，将卷积核的范围进行扩大;利用非对称卷积思想分解卷积核，以减少参数量。实验结果表明，相较经典深度残差卷积神经网络模型和原始DenseNet模型，提出的模型在THCHS30数据集上的语音识别性能更好，在保证识别率的情况下，还减少了网络参数量，提高了模型训练效率。
关键词：	语音识别非对称卷积训练效率卷积神经网络
收稿时间：	2020/12/18 0:00:00
修稿时间：	2022/5/25 0:00:00
Research on the DenseNet model for speech recognition

LIU Xiangde,WANG Yunqiu,JIANG Qin,ZHANG Yi,HE Xiangpeng.Research on the DenseNet model for speech recognition[J].Journal of Chongqing University of Posts and Telecommunications,2022,34(4):604-611.

Authors:	LIU Xiangde WANG Yunqiu JIANG Qin ZHANG Yi HE Xiangpeng

Institution:	School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China;School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China

Abstract:	To solve the problems of the disappearance of low-level features, the large amounts of parameters and the difficulty of network training due to the deepening of the network in speech recognition, we propose an improved densely connected convolutional neural network model (DenseNet) based on the idea of asymmetric convolution of Inception V3 network. According to the long-term correlation of speech recognition, the model uses dense connection blocks to establish connection relationships between different layers to preserve low-level features and strengthen feature propagation, and in order to obtain a richer scale of acoustic features, the model expands the scope of the convolution kernel. In addition, the idea of asymmetric convolution is used to decompose the convolution kernel to reduce the amounts of parameters. The experimental results reveal that compared with the classic deep residual convolutional neural network model and the original DenseNet model, this model has better speech recognition performance on the THCHS30 data set. In the case of ensuring the recognition rate, the amount of network parameters is reduced, and the training efficiency of the model is improved.

Keywords:	speech recognition asymmetric convolution training efficiency convolutional neural network

	点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏