用于人体行为识别的Inflated VGGNet-16网络 Inflated VGGNet-16 networks for human action recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

用于人体行为识别的Inflated VGGNet-16网络

引用本文：	王震,刘瑞敏,黄琼桃.用于人体行为识别的Inflated VGGNet-16网络[J].北京化工大学学报(自然科学版),2020,47(3):114.

作者姓名：	王震刘瑞敏黄琼桃

作者单位：	昆明理工大学信息工程与自动化学院, 昆明 650504

基金项目：	国家自然科学基金（61863018）

摘要：	针对目前人体行为识别算法中C3D网络结构较浅、特征提取能力差，以及无可用预训练模型、训练耗时长等问题，以更深的VGGNet-16网络为基础，通过添加批归一化层（batch normalization layer）以及使用Inflating方法将ImageNet预训练模型用于网络初始化，设计了一种新型的人体行为识别3D网络。通过在标准数据集UCF101与HMDB-51上的实验分析，将图片进行中心剪切后作为所设计网络的输入，从零训练时在UCF101数据集上比原始C3D网络的精度提高了9.2%，并且网络收敛速度更快，验证了所设计的Inflated VGGNet-16网络具有更强的特征提取与泛化能力。最后，将所设计网络加上10倍数据增强，在两个标准数据集上准确率分别达到了89.6%与61.7%，相比于较浅的C3D网络在UCF101数据集上提升了7.3%，超过了传统的改进密集轨迹法（iDT）以及经典的双流卷积神经网络（two-stream），具有较高的行为识别准确率。
关键词：	行为识别 VGGNet-16 Inflating ImageNet预训练数据增强
收稿时间：	2019-12-03
Inflated VGGNet-16 networks for human action recognition

WANG Zhen,LIU RuiMin,HUANG QiongTao.Inflated VGGNet-16 networks for human action recognition[J].Journal of Beijing University of Chemical Technology,2020,47(3):114.

Authors:	WANG Zhen LIU RuiMin HUANG QiongTao

Institution:	School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China

Abstract:	Human behavior recognition 3D networks suffer from problems of shallow C3D network structure, poor feature extraction ability, lack of available pre-training models, and long training times. By starting from a deeper VGGNet-16 network, and adding a batch normalization layer and using the ImageNet pre-training model Inflating method for network initialization, we have designed a new human behavior recognition 3D network. In experimental analysis using the standard datasets UCF101 and HMDB-51, images were center-cropped and used as the input to the network. The accuracy of the original C3D network was 9.2% higher using the UCF101 dataset from scratch, and the network convergence was faster, which shows that our Inflated VGGNet-16 network has stronger feature extraction and better generalization capabilities. Finally, our network was modified with ten-fold data enhancement, and the accuracy ratio for the two standard data sets UCF101 and HMDB-51 was 89.6% and 61.7% respectively, which in the case of UCF101 is 7.3% higher than the value for the shallower C3D network, and have higher behavior recognition accuracy than the traditional improved dense trajectory method (iDT) and the classic two-stream convolutional neural network.

Keywords:	action recognition VGGNet-16 Inflating ImageNet pre-training data augmentation
本文献已被 CNKI 等数据库收录！
	点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《北京化工大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏