基于CGRU模型的语音情感识别研究与实现 Research and Implementation of Speech Emotion Recognition Based on CGRU Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于CGRU模型的语音情感识别研究与实现

引用本文：	郑艳,陈家楠,吴凡,付彬.基于CGRU模型的语音情感识别研究与实现[J].东北大学学报(自然科学版),2020,41(12):1680-1685.

作者姓名：	郑艳陈家楠吴凡付彬

作者单位：	(东北大学信息科学与工程学院，辽宁沈阳110819)

基金项目：	国家自然科学基金资助项目(61773108).

摘要：	语音情感识别是人机交互、情感计算中重要的研究方向.目前普遍使用深度神经网络用于语音情感特征的提取，但使用哪种神经网络模型、如何缓解模型过拟合问题还需进一步研究.针对这些问题，提出了一种结合一维卷积(CNN)以及门控循环单元(GRU)的CGRU模型，从原始语音信号的MFCC特征中提取语音的低阶以及高阶情感特征，并通过随机森林对其进行特征选择，在三种公用的情感语料库EMODB，SAVEE，RAVDESS上分别取得了79%，69%以及75%的识别精度.通过添加高斯噪声及改变速度等方法来增加样本量实现数据扩充，进一步提高了识别精度.通过在线识别系统验证了模型在实际环境中的可用性.
关键词：	语音情感识别梅尔频率倒谱系数 CGRU模型随机森林数据扩充
收稿时间：	2020-02-05
修稿时间：	2020-02-05
Research and Implementation of Speech Emotion Recognition Based on CGRU Model

ZHENG Yan,CHEN Jia-nan,WU Fan,FU Bin.Research and Implementation of Speech Emotion Recognition Based on CGRU Model[J].Journal of Northeastern University(Natural Science),2020,41(12):1680-1685.

Authors:	ZHENG Yan CHEN Jia-nan WU Fan FU Bin

Institution:	School of Information Science & Engineering， Northeastern University， Shenyang 110819， China.

Abstract:	Speech emotion recognition is a very important research direction in emotion computing and human-computer interaction. At present， deep neural network is widely used to extract emotional features of speech， but further research is needed on which neural network model to use and how to alleviate the problem of model overfitting. To solve these problems， a CGRU model was proposed， which combined one dimensional convolutional neural networks (CNN) and gated circulation unit (GRU). The low-order and high-order emotional features of speech were extracted from the MFCC features of the original speech signal， and the features were selected through random forest， which achieved 79%， 69% and 75% recognition accuracy respectively on three common emotional corpus: EMODB， SAVEE， RAVDESS. By using the data augmentation technique， the sample size was increased by adding gaussian noise and changing the speed， which further improved the identification accuracy. The availability of the model in the real world was verified through the online identification system.

Keywords:	speech emotion recognition Mel-frequency cepstral coefficients CGRU model random forest data augmentation

	点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《东北大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏