特征图自适应知识蒸馏模型 Activation Map Adaptation Model for Knowledge Distillation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

特征图自适应知识蒸馏模型

引用本文：	吴致远,齐红,姜宇,崔楚朋,杨宗敏,薛欣慧.特征图自适应知识蒸馏模型[J].吉林大学学报(理学版),2022,60(4):881-888.

作者姓名：	吴致远齐红姜宇崔楚朋杨宗敏薛欣慧

作者单位：	1. 吉林大学计算机科学与技术学院, 长春 130012； 2. 中国科学院计算技术研究所，北京 100190； 3. 吉林大学符号计算与知识工程教育部重点实验室, 长春 130012

摘要：	针对嵌入式和移动设备的计算和存储资源受限, 紧凑型网络优化易收敛至较差局部最优解的问题, 提出一个特征图自适应知识蒸馏模型, 其由特征图适配器和特征图自适应知识蒸馏策略构成. 首先, 特征图适配器通过异构卷积与视觉特征表达模块的堆叠实现特征图尺寸匹配、教师学生网络特征同步变换及自适应语义信息匹配. 其次, 特征图自适应知识蒸馏策略将适配器嵌入教师网络对其进行重构, 并在训练过程中实现适合用于学生网络隐藏层监督特征的自适应搜索；利用适配器前部输出提示学生网络前部训练, 实现教师到学生网络的知识迁移, 并在学习率约束条件下进一步优化. 最后, 在图像分类任务数据集cifar-10上进行实验验证，结果表明, 特征图自适应知识蒸馏模型分类正确率提高0.6%, 推断损失降低65%, 并将收敛至78.2%正确率的时间减少至未迁移时的5.6%.
关键词：	人工智能知识蒸馏特征图自适应模型迁移图像分类
收稿时间：	2021-06-21
Activation Map Adaptation Model for Knowledge Distillation

WU Zhiyuan,QI Hong,JIANG Yu,CUI Chupeng,YANG Zongmin,XUE Xinhui.Activation Map Adaptation Model for Knowledge Distillation[J].Journal of Jilin University: Sci Ed,2022,60(4):881-888.

Authors:	WU Zhiyuan QI Hong JIANG Yu CUI Chupeng YANG Zongmin XUE Xinhui

Institution:	1. College of Computer Science and Technology, Jilin University, Changchun 130012, China； 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 3. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China

Abstract:	Aiming at the problem that computational and storage resources of embedded and mobile devices were limited, and the compact network optimization was easy to converge to poor local optimal solutions, we proposed an activation map adaptation model for knowledge distillation, which was composed of an activation map adapter and an activation map adaptation knowledge distillation strategy. Firstly, the activation map adapter realized activation map size matching, synchronous transformation of teacher-student network features, and adaptive semantic information matching by heterogeneous convolution and stacking of visual feature expression modules. Secondly, the activation map adaptation knowledge distillation strategy embedded the adapter into the teacher network to reconstruct it, and realized adaptively search suitable for the supervision features of the hidden layer of the student network during training process, the front output of the adapter was used to prompt the front training of the student network, so as to realize knowledge transfer from the teacher to the student network, and further optimize it under the constraint of learning rate. Finally, experimental verification was carried out on the image classification task dataset cifar-10. The results show that the classification accuracy of the activation map adaptive knowledge distillation model is improved by 0.6%, the inference loss is reduced by 6.5%, and the time to converge to 78.2% accuracy is reduced to 5.6% when it is not migrated.

Keywords:	artificial intelligence knowledge distillation activation map adaptation model transfer image classification

	点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
	点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏