基于深度学习的轻量化目标检测算法

doi:10.12305/j.issn.1001-506X.2022.09.03

系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (9): 2716-2725.doi: 10.12305/j.issn.1001-506X.2022.09.03

基于深度学习的轻量化目标检测算法

宋爽^1,², 张悦^1,², 张琳娜³, 岑翼刚^1,^2,*, 李浥东¹

1. 北京交通大学计算机与信息技术学院, 北京 100044
2. 现代信息科学与网络技术北京市重点实验室, 北京 100044
3. 贵州大学机械工程学院, 贵州贵阳 550025

收稿日期:2021-11-22 出版日期:2022-09-01 发布日期:2022-09-01
通讯作者: 岑翼刚
作者简介:宋爽(1998—), 男, 硕士研究生, 主要研究方向为机器视觉、深度神经网络的压缩和加速|张悦(1990—), 女, 博士研究生, 主要研究方向为深度学习、行人重识别、模式识别|张琳娜(1977—), 女, 讲师, 硕士研究生导师, 主要研究方向为工业产品缺陷检测、机器视觉|岑翼刚(1978—), 男, 教授, 博士, 主要研究方向为低秩矩阵重构、稀疏表示、小波分析、异常检测|李浥东(1982—), 男, 教授, 博士, 主要研究方向为先进计算、大数据分析与安全、隐私保护、智能交通
基金资助:
国家重点研发计划(2019YFB2204200);国家自然科学基金(62062021);国家自然科学基金(61872034);国家自然科学基金(62011530042);北京市自然科学基金(4202055);广西自然科学基金(2018GXNSFBA281086);贵州省科技计划(黔科中引地[2021]4023)

Lightweight target detection algorithm based on deep learning

Shuang SONG^1,², Yue ZHANG^1,², Linna ZHANG³, Yigang CEN^1,^2,*, Yidong LI¹

1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
2. Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China
3. School of Mechanical Engineering, Guizhou University, Guiyang 550025, China

Received:2021-11-22 Online:2022-09-01 Published:2022-09-01
Contact: Yigang CEN

摘要/Abstract

摘要：

深度卷积神经网络在各个领域都表现出很好的效果, 与之伴随的是庞大的计算量和参数量。针对当前基于深度卷积神经网络的目标检测算法对计算资源需求太大和内存消耗严重的问题, 提出一种高性能轻量化的网络模型。首先将Stem模块和ShuffleNet V2进行融合, 提升网络特征提取能力, 并利用融合后的网络对原始YOLOv5的骨干网络进行重构, 显著降低了网络的计算量和内存占用, 同时, 引入可变形卷积以提升网络的检测性能。道路监控图像和VOC、COCO数据集测试结果表明, 所提出的模型在保持检测精度的前提下, 将参数量和模型尺寸降低了90%, 计算量仅为原始模型的18%, 实现了检测模型的轻量化, 更有助于在计算资源有限和对实时性要求高的场景中部署。

关键词: 目标检测, 卷积神经网络, 轻量化网络, 单阶段检测算法, 可变形卷积

Abstract:

Deep convolution neural networks have shown good results in various fields, accompanied by a huge amount of calculation and parameters. Aiming at the problems of high requirement of computational resources and serious memory consumption of the current deep convolution neural network based object detection algorithms, a high-performance lightweight network model is proposed. Firstly, Stem module and ShuffleNet V2 are fused to improve the network feature extraction capability, and the original YOLOv5 backbone network is reconstructed by the fused network, which significantly reduces the computational cost and memory consumption of the network. Meanwhile, deformable convolution is introduced to improve the detection performance of the network. Experimental results on the road monitoring images and VOC, COCO data sets show that the proposed model reduces the parameter and model size by 90%, and the calculation amount is only 18% of the original model, while the detection accuracy can be still maintained. The proposed lightweight detection model is more conducive to be deploied in the scenarios of limited computational resources and high real-time requirements.

Key words: object detection, convolution neural network, lightweight network, single stage detection algorithm, deformable convolution

中图分类号:

TP391

宋爽, 张悦, 张琳娜, 岑翼刚, 李浥东. 基于深度学习的轻量化目标检测算法[J]. 系统工程与电子技术, 2022, 44(9): 2716-2725.

Shuang SONG, Yue ZHANG, Linna ZHANG, Yigang CEN, Yidong LI. Lightweight target detection algorithm based on deep learning[J]. Systems Engineering and Electronics, 2022, 44(9): 2716-2725.

图/表 15

图1

图2

图3

图4

图5

图6

图7

图8

图9

表1

表2

表3

表4

图10

图11

参考文献 46

1	SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Proc. of the 31th AAAI Conference on Artificial Intelligence, 2017: 4278-4284.
2	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
3	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[J]. Advances inNeural Information Processing Systems, 2012, 25 (2): 1097- 1105.
4	RUSSAKOVSKY O , DENG J , SU H , et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252. doi: 10.1007/s11263-015-0816-y
5	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1409.1556.
6	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9.
7	HE K M, ZHANG X, REN S Q, et al. Identity mappings in deep residual networks[C]//Proc. of the European Conference on Computer Vision, 2016: 630-645.
8	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141.
9	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proc. of the European Conference on Computer Vision, 2016: 21-37.
10	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1701.06659.
11	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
12	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271.
13	REDMON J, FARHADI A. Yolov3: an incremental improvement[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.02767.
14	REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2016, 39 (6): 1137- 1149.
15	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proc. of the Advances in Neural Information Processing Systems, 2016: 379-387.
16	张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58 (4): 438- 444.
	ZHANG X Y , GAO H B , ZHAO J H , et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University(Science and Technology), 2018, 58 (4): 438- 444.
17	CHEN C Y, SEFF A, KORNHAUSER A, et al. Deepdriving: learning affordance for direct perception in autonomous driving[C]//Proc. of the IEEE International Conference on Computer Vision, 2015: 2722-2730.
18	王云峰, 黎作鹏. 边缘环境中目标检测算法的应用研究[J]. 计算机工程与应用, 2021, 57 (16): 220- 227. doi: 10.3778/j.issn.1002-8331.2008-0280
	WANG Y F , LI Z P . Application research of target detection algorithm in edge environment[J]. Computer Engineering and Application, 2021, 57 (16): 220- 227. doi: 10.3778/j.issn.1002-8331.2008-0280
19	谌颃, 孙道宗. 基于CS优化深度学习卷积神经网络的目标检测算法[J]. 机床与液压, 2020, 48 (6): 187- 192. doi: 10.3969/j.issn.1001-3881.2020.06.028
	CHEN H , SUN D Z . Target detection algorithm based on CS optimized deep learning convolutional neural network[J]. Machine Tool & Hydraulics, 2020, 48 (6): 187- 192. doi: 10.3969/j.issn.1001-3881.2020.06.028
20	HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589.
21	XIONG Y Y, LIU H X, GUPTA S, et al. Mobiledets: searching for object detection architectures for mobile accelerators[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3825-3834.
22	WU B C, DAI X L, ZHANG P Z, et al. FBNet: hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10734-10742.
23	ZHANG X Y, ZHOU X Y, LIN M X, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6848-6856.
24	MA N N, ZHANG X Y, ZHENG H T, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design[C]//Proc. of the European Conference on Computer Vision, 2018: 116-131.
25	WANG R J, LI X, LING C X. Pelee: a real-time object detection system on mobile devices[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.06882.
26	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 764-773.
27	ZHU X Z, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316.
28	EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338. doi: 10.1007/s11263-009-0275-4
29	HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1510.00149.
30	HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1506.02626.
31	LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 2736-2744.
32	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1503.02531.
33	LUO P, ZHU Z Y, LIU Z W, et al. Face model compression by distilling knowledge from neurons[C]//Proc. of the 30th AAAI Conference on Artificial Intelligence, 2016: 3560-3566.
34	WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 390-391.
35	HE K M , ZHANG X Y , REN S Q , et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916. doi: 10.1109/TPAMI.2015.2389824
36	CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. cuDNN: efficient primitives for deep learning[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1410.0759.
37	HOWARD A G, ZHU M L, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1704.04861.
38	SANDLER M, HOWARD A, ZHU M L, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
39	QIN Z, LI Z M, ZHANG Z N, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]//Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 6718-6727.
40	HUANG Z C , WANG J L , FU X , et al. DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection[J]. Information Sciences, 2020, 522, 241- 258. doi: 10.1016/j.ins.2020.02.067
41	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2004.10934.
42	WONG A, FAMUORI M, SHAFIEE M J, et al. YOLO nano: a highly compact you only look once convolutional neural network for object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1910.01271.
43	LONG X, DENG K P, WANG G D, et al. PP-YOLO: an effective and efficient implementation of object detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2007.12099.
44	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proc. of the European Conference on Computer Vision, 2014: 740-755.
45	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.08430.
46	ZHANG Y M, LEE C C, HSIEH J W, et al. CSL-YOLO: a new lightweight object detection system for edge computing[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.04829.

类别	数据分布
类别	训练	测试
人	14 591	1 389
摩托车	3 468	372
小汽车	14 691	1 527
巴士	494	51
皮卡车	4 379	414
货车	7 538	752
大货车	765	76

模型	图像尺寸	FLOPs/B	模型大小/MB	mAP
MobileNet-SSD^[25]	300×300	1.15	13.2	0.680
Pelee-SSD	304×304	2.4	21.68	0.709
本文(320×320)	320×320	0.8	1.32	0.665
Tiny-YOLO	416×416	5.52	33.4	0.584
YOLO-Nano^[42]	416×416	4.57	4.0	0.691
ThunderNet_MM^[39]	416×416	-	32.9	0.738
PP-YOLO^[43]	416×416	-	269	0.843
本文(416×416)	416×416	1.3	1.32	0.681
YOLOv5s	512×512	10.9	13.73	0.852
本文(512×512)	512×512	2.0	1.32	0.696

模型	图像尺寸	FLOPs/B	模型大小/MB	AP^val
PP-YOLO_MBV3_S	320×320	-	16	0.172
PP-YOLO-Tiny	416×416	-	4.2	0.227
YOLOX-Nano^[45]	416×416	1.08	7.3	0.253
YOLOX-Tiny^[45]	416×416	6.45	38.8	0.317
Tiny-YOLO	416×416	5.52	33.4	0.166
YOLOv4-Tiny	416×416	6.9	23.1	0.217
MM-YOLO-MBV2	416×416	-	14.5	0.239
CSL-YOLO^[46]	416×416	1.47	14.6	0.245
YOLOv5s	640×640	17.1	13.73	0.367
本文	416×416	1.3	1.32	0.231

模型	参数量	模型大小/MB	推理耗时/ms	FLOPs416/B	mAP@: .5
YOLOv5s-prune	2 665 659	4.97	5.5	3.1	0.868
YOLOv5-Shuffle	364 925	0.78	2.6	0.5	0.834
YOLOv5-ShuffleS	367 629	0.80	2.8	0.6	0.846
YOLOv5s	7 276 605	13.73	5.6	7.2	0.934
本文	643 509	1.32	4.9	1.3	0.908

[1]	肖宇, 邓正宏, 张展. 基于双阶段互信息准则的多目标检测波形设计[J]. 系统工程与电子技术, 2022, 44(9): 2736-2742.
[2]	王彩云, 吴钇达, 王佳宁, 马璐, 赵焕玥. 基于改进的CNN和数据增强的SAR目标识别[J]. 系统工程与电子技术, 2022, 44(8): 2483-2487.
[3]	刘祥, 黄天耀, 刘一民. 频率捷变雷达的扩展目标检测[J]. 系统工程与电子技术, 2022, 44(6): 1833-1838.
[4]	韦娟, 杨皇卫, 宁方立. 基于NMF与CNN联合优化的声学场景分类[J]. 系统工程与电子技术, 2022, 44(5): 1433-1438.
[5]	赵晓枫, 徐叶斌, 吴飞, 牛家辉, 蔡伟, 张志利. 基于全局感知机制的地面红外目标检测方法[J]. 系统工程与电子技术, 2022, 44(5): 1461-1467.
[6]	魏文晓, 刘洁瑜, 沈强, 李成. 基于人眼视点图的特征融合小目标检测算法[J]. 系统工程与电子技术, 2022, 44(4): 1120-1127.
[7]	陈冬, 句彦伟. 基于语义分割实现的SAR图像舰船目标检测[J]. 系统工程与电子技术, 2022, 44(4): 1195-1201.
[8]	方伟, 王玉, 闫文君, 林冲. 基于神经网络的符号化飞行动作识别[J]. 系统工程与电子技术, 2022, 44(3): 737-745.
[9]	李洪瑶, 李小强, 韩心中, 谢学立, 席建祥. 基于决策融合的多无人机协同目标检测识别算法[J]. 系统工程与电子技术, 2022, 44(3): 746-754.
[10]	孙晶明, 虞盛康, 孙俊. 基于深度学习的HRRP识别姿态敏感性分析[J]. 系统工程与电子技术, 2022, 44(3): 802-807.
[11]	刘恒燕, 张立民, 闫文君, 钟兆根, 凌青, 梁晓军. 基于WBP-CNN算法的LDPC译码[J]. 系统工程与电子技术, 2022, 44(3): 1030-1035.
[12]	邵凯, 朱苗苗, 王光宇. 基于生成对抗与卷积神经网络的调制识别方法[J]. 系统工程与电子技术, 2022, 44(3): 1036-1043.
[13]	张玺, 金正猛, 姜亚琴. 融合深度图像先验的全变差图像着色算法[J]. 系统工程与电子技术, 2022, 44(2): 385-393.
[14]	金涛, 王晓峰, 田润澜, 张歆东. 基于改进1DCNN+TCN的雷达辐射源快速识别方法[J]. 系统工程与电子技术, 2022, 44(2): 463-469.
[15]	吕勤哲, 全英汇, 沙明辉, 董淑仙, 邢孟道. 基于集成深度学习的有源干扰智能分类[J]. 系统工程与电子技术, 2022, 44(12): 3595-3602.

基于深度学习的轻量化目标检测算法

Lightweight target detection algorithm based on deep learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 46

相关文章 15

编辑推荐

Metrics

本文评价