首页 | 本学科首页   官方微博 | 高级检索  
     检索      

特征图自适应知识蒸馏模型
引用本文:吴致远,齐红,姜宇,崔楚朋,杨宗敏,薛欣慧.特征图自适应知识蒸馏模型[J].吉林大学学报(理学版),2022,60(4):881-888.
作者姓名:吴致远  齐红  姜宇  崔楚朋  杨宗敏  薛欣慧
作者单位:1. 吉林大学 计算机科学与技术学院, 长春 130012; 2. 中国科学院 计算技术研究所, 北京 100190; 3. 吉林大学 符号计算与知识工程教育部重点实验室, 长春 130012
摘    要:针对嵌入式和移动设备的计算和存储资源受限, 紧凑型网络优化易收敛至较差局部最优解的问题, 提出一个特征图自适应知识蒸馏模型, 其由特征图适配器和特征图自适应知识蒸馏策略构成. 首先, 特征图适配器通过异构卷积与视觉特征表达模块的堆叠实现特征图尺寸匹配、 教师学生网络特征同步变换及自适应语义信息匹配. 其次, 特征图自适应知识蒸馏策略将适配器嵌入教师网络对其进行重构, 并在训练过程中实现适合用于学生网络隐藏层监督特征的自适应搜索; 利用适配器前部输出提示学生网络前部训练, 实现教师到学生网络的知识迁移, 并在学习率约束条件下进一步优化. 最后, 在图像分类任务数据集cifar-10上进行实验验证, 结果表明, 特征图自适应知识蒸馏模型分类正确率提高0.6%, 推断损失降低65%, 并将收敛至78.2%正确率的时间减少至未迁移时的5.6%.

关 键 词:人工智能    知识蒸馏    特征图自适应    模型迁移    图像分类  
收稿时间:2021-06-21

Activation Map Adaptation Model for Knowledge Distillation
WU Zhiyuan,QI Hong,JIANG Yu,CUI Chupeng,YANG Zongmin,XUE Xinhui.Activation Map Adaptation Model for Knowledge Distillation[J].Journal of Jilin University: Sci Ed,2022,60(4):881-888.
Authors:WU Zhiyuan  QI Hong  JIANG Yu  CUI Chupeng  YANG Zongmin  XUE Xinhui
Institution:1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
3. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
Abstract:Aiming at the problem that computational and storage resources of embedded and mobile devices were limited, and the compact  network optimization was easy to  converge to poor local optimal solutions,  we proposed an  activation map adaptation model for knowledge distillation,  which was composed  of an activation map adapter and an activation map adaptation knowledge distillation strategy. Firstly, the activation map adapter realized activation map size matching, synchronous transformation of teacher-student network features, and adaptive semantic information matching by heterogeneous convolution and stacking of visual feature expression modules. Secondly, the activation map adaptation knowledge distillation strategy embedded the adapter into the teacher network to reconstruct it, and realized adaptively search suitable for the  supervision features of the hidden layer of the student network  during training process,  the front output  of the adapter was used to prompt the front training of the  student network, so as to realize knowledge transfer from the teacher to the student network, and further optimize it  under the constraint of learning rate. Finally, experimental verification was carried out on the image classification task  dataset cifar-10. The results show  that the classification accuracy of the activation map adaptive knowledge distillation model is improved by 0.6%, the inference loss is reduced by 6.5%, and  the time to converge to 78.2% accuracy is reduced to 5.6% when it is not migrated.
Keywords:artificial intelligence  knowledge distillation  activation map adaptation  model transfer  image classification  
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号