首页 | 本学科首页   官方微博 | 高级检索  
     

基于注意力模型的卷积循环神经网络城市声音识别
引用本文:杨 磊,赵红东. 基于注意力模型的卷积循环神经网络城市声音识别[J]. 科学技术与工程, 2020, 20(33): 13757-13761
作者姓名:杨 磊  赵红东
作者单位:河北工业大学电子信息工程学院,天津300300;河北工业大学电子信息工程学院,天津300300
基金项目:光电信息控制和安全技术重点实验室基金项目资助614210701041705
摘    要:
环境声音识别(Eenvironment Ssound Rrecognition ,ESR)在基于情景感知和辅助技术等领域发挥着重要作用。卷积神经网络(CNN)和循环神经网络(RNN)作为两种最具代表性的特征提取方法,在语音和音乐信号处理方面都取得显著效果,然而二者都存在一定缺点,CNN无法有效提取时间特征,RNN在提取空间特征上也存在明显劣势。为了有效的提取并利用时间特征和空间特征,提出一种新模型,利用时间分布卷积神经网络(CNN)从梅尔频谱图中提取城市环境声音特征,然后应用双向长短时记忆网络(BiLSTM)从CNN输出中获取时间信息,最后在BRNN的输出序列上实施注意力机制,从而关注到与城市环境声音最相关的特征进而做出分类判断,注意力机制既提高了分类准确性,又增强了模型的可解释性。实验结果表明,在Urbansound8K数据集中,该模型可获得80.2%的分类准确率,这优于以前在同一数据集的报告结果

关 键 词:卷积神经网络  双向长短时记忆网络  注意力机制
收稿时间:2019-10-28
修稿时间:2020-09-11

Urban Sound Classification Using Convolutional Recurrent Neural Networks with Attention Model
YANG Lei. Urban Sound Classification Using Convolutional Recurrent Neural Networks with Attention Model[J]. Science Technology and Engineering, 2020, 20(33): 13757-13761
Authors:YANG Lei
Affiliation:Hebei University of Technology
Abstract:
Environment sound recognition (ESR) is widely applied in the fields of context-based awareness and assistive technologies. Convolutional neural network (CNN) and recurrent neural network (RNN) are the most effective feature extraction methods, which have achieved remarkable results in speech and music signal processing. However, CNN is not effective enough to process time-related features, and RNN has a disadvantage in extracting spatial features. To effectively extract and use temporal and spatial features, a novel model (CNN+BiLSTM+Attention-mechanism) was proposed to overcome the above shortcomings. In this model, CNN was adopted to learn significant features from Mel spectral information, and then bi-directional long and short-term memory (BiLSTM) was used to obtain the time information from the CNN output, and finally, an Attention-mechanism was implemented on the output sequence of the BiLSTM to focus on the target characteristics of the ambient sound. The experimental result is proved to obtain an average accuracy of 80.2%, which is superior to the other state-of-the-art classification methods in the Urbandsound8K dataset.
Keywords:convolutional neural network  bi-directional long and short-term memory  attention-mechanism
本文献已被 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号