首页 | 本学科首页   官方微博 | 高级检索  
     检索      

用于人体行为识别的Inflated VGGNet-16网络
引用本文:王震,刘瑞敏,黄琼桃.用于人体行为识别的Inflated VGGNet-16网络[J].北京化工大学学报(自然科学版),2020,47(3):114.
作者姓名:王震  刘瑞敏  黄琼桃
作者单位:昆明理工大学 信息工程与自动化学院, 昆明 650504
基金项目:国家自然科学基金(61863018)
摘    要:针对目前人体行为识别算法中C3D网络结构较浅、特征提取能力差,以及无可用预训练模型、训练耗时长等问题,以更深的VGGNet-16网络为基础,通过添加批归一化层(batch normalization layer)以及使用Inflating方法将ImageNet预训练模型用于网络初始化,设计了一种新型的人体行为识别3D网络。通过在标准数据集UCF101与HMDB-51上的实验分析,将图片进行中心剪切后作为所设计网络的输入,从零训练时在UCF101数据集上比原始C3D网络的精度提高了9.2%,并且网络收敛速度更快,验证了所设计的Inflated VGGNet-16网络具有更强的特征提取与泛化能力。最后,将所设计网络加上10倍数据增强,在两个标准数据集上准确率分别达到了89.6%与61.7%,相比于较浅的C3D网络在UCF101数据集上提升了7.3%,超过了传统的改进密集轨迹法(iDT)以及经典的双流卷积神经网络(two-stream),具有较高的行为识别准确率。

关 键 词:行为识别    VGGNet-16    Inflating    ImageNet预训练    数据增强
收稿时间:2019-12-03

Inflated VGGNet-16 networks for human action recognition
WANG Zhen,LIU RuiMin,HUANG QiongTao.Inflated VGGNet-16 networks for human action recognition[J].Journal of Beijing University of Chemical Technology,2020,47(3):114.
Authors:WANG Zhen  LIU RuiMin  HUANG QiongTao
Institution:School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China
Abstract:Human behavior recognition 3D networks suffer from problems of shallow C3D network structure, poor feature extraction ability, lack of available pre-training models, and long training times. By starting from a deeper VGGNet-16 network, and adding a batch normalization layer and using the ImageNet pre-training model Inflating method for network initialization, we have designed a new human behavior recognition 3D network. In experimental analysis using the standard datasets UCF101 and HMDB-51, images were center-cropped and used as the input to the network. The accuracy of the original C3D network was 9.2% higher using the UCF101 dataset from scratch, and the network convergence was faster, which shows that our Inflated VGGNet-16 network has stronger feature extraction and better generalization capabilities. Finally, our network was modified with ten-fold data enhancement, and the accuracy ratio for the two standard data sets UCF101 and HMDB-51 was 89.6% and 61.7% respectively, which in the case of UCF101 is 7.3% higher than the value for the shallower C3D network, and have higher behavior recognition accuracy than the traditional improved dense trajectory method (iDT) and the classic two-stream convolutional neural network.
Keywords:action recognition                                                                                                                        VGGNet-16                                                                                                                        Inflating                                                                                                                        ImageNet pre-training                                                                                                                        data augmentation
本文献已被 CNKI 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号