系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (4): 1174-1184.doi: 10.12305/j.issn.1001-506X.2024.04.05

• 电子技术 • 上一篇    下一篇

基于CenterNet的多教师联合知识蒸馏

刘绍华, 杜康, 佘春东, 杨傲   

  1. 北京邮电大学电子工程学院, 北京 100080
  • 收稿日期:2022-12-05 出版日期:2024-03-25 发布日期:2024-03-25
  • 通讯作者: 佘春东
  • 作者简介:刘绍华 (1976—), 男, 副教授, 博士, 主要研究方向为人工智能与电信工程
    杜康 (1998—), 男, 硕士研究生, 主要研究方向为人工智能、机器学习
    佘春东 (1971—), 男, 高级工程师, 博士, 主要研究方向为移动通信、人工智能、嵌入式系统
    杨傲 (1998—), 男, 硕士研究生, 主要研究方向为人工智能、机器学习
  • 基金资助:
    国家自然科学基金(91938301)

Multi-teacher joint knowledge distillation based on CenterNet

Shaohua LIU, Kang DU, Chundong SHE, Ao YANG   

  1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100080, China
  • Received:2022-12-05 Online:2024-03-25 Published:2024-03-25
  • Contact: Chundong SHE

摘要:

介绍了一种基于轻量化CenterNet的多教师联合知识蒸馏方案。所提方案能有效解决模型轻量化带来的性能恶化问题,可以显著缩小教师模型和学生模型之间的性能差距。将大规模复杂模型作为教师模型,指导训练轻量化学生模型。相比于模型的传统训练方案,使用所提知识蒸馏训练方案可以在相同的训练轮数后使轻量化模型达到更优的检测性能。主要贡献是提出了适用于CenterNet目标检测网络的新型知识蒸馏训练方案——多教师联合知识蒸馏。在后续实验中,进一步引入了蒸馏注意力机制,从而优化了多教师联合知识蒸馏的训练效果。在VOC2007数据集(Visual Object Classes 2007 Dataset)上,以MobileNetV2轻量化网络作为主干网络为例,相较于传统的CenterNet(主干网络为ResNet50),所提方案在参数量指标上压缩了74.7%,推理速度提升了70.5%,在平均精度上只有1.99的降低,取得了更好的“性能-速度”平衡。实验证明,同样经过100轮训练,使用多教师联合知识蒸馏训练方案的轻量化模型相较于普通训练方案,平均精度提升了11.30。

关键词: 轻量化, 知识蒸馏, 注意力机制, 联合训练

Abstract:

This paper introduces a multi-teacher joint knowledge distillation scheme based on lightweight CenterNet. The proposed scheme can effectively solve the problem of performance deterioration caused by lightweight model, and can significantly narrow the performance gap between teacher model and student model. The large-scale complex model is used as the teacher model to guide the training of the lightweight student model. Compared with the traditional training scheme of the model, the proposed knowledge distillation training scheme can achieve better detection performance of the lightweight model after the same number of training epochs. The main contribution of this paper is to propose the multi-teacher joint knowledge distillation which is a new knowledge distillation training scheme for CenterNet object detection network. In the follow-up experiment, the distillation attention mechanism is further introduced to optimize the training effect of multi-teacher joint knowledge distillation. On Visual Object Classes 2007 Dataset (VOC2007), taking MobileNetV2 lightweight network as backbone network as an example, compared with traditional CenterNet (backbone network is ResNet50), the parameter number index is compressed by 74.7%, the inference speed is increased by 70.5%, and the mean Average Precision (mAP) is only reduced by 1.99. A better "performance-speed" balance is then achieved. In addition, the experiment result proves that after the same 100 epochs of training, the mAP of the lightweight model using the multi-teacher joint knowledge distillation training scheme is improved by 11.30 compared with the ordinary training scheme.

Key words: lightweight, knowledge distillation, attention mechanism, joint training

中图分类号: