系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (9): 2716-2725.doi: 10.12305/j.issn.1001-506X.2022.09.03
宋爽1,2, 张悦1,2, 张琳娜3, 岑翼刚1,2,*, 李浥东1
收稿日期:
2021-11-22
出版日期:
2022-09-01
发布日期:
2022-09-01
通讯作者:
岑翼刚
作者简介:
宋爽(1998—), 男, 硕士研究生, 主要研究方向为机器视觉、深度神经网络的压缩和加速|张悦(1990—), 女, 博士研究生, 主要研究方向为深度学习、行人重识别、模式识别|张琳娜(1977—), 女, 讲师, 硕士研究生导师, 主要研究方向为工业产品缺陷检测、机器视觉|岑翼刚(1978—), 男, 教授, 博士, 主要研究方向为低秩矩阵重构、稀疏表示、小波分析、异常检测|李浥东(1982—), 男, 教授, 博士, 主要研究方向为先进计算、大数据分析与安全、隐私保护、智能交通
基金资助:
Shuang SONG1,2, Yue ZHANG1,2, Linna ZHANG3, Yigang CEN1,2,*, Yidong LI1
Received:
2021-11-22
Online:
2022-09-01
Published:
2022-09-01
Contact:
Yigang CEN
摘要:
深度卷积神经网络在各个领域都表现出很好的效果, 与之伴随的是庞大的计算量和参数量。针对当前基于深度卷积神经网络的目标检测算法对计算资源需求太大和内存消耗严重的问题, 提出一种高性能轻量化的网络模型。首先将Stem模块和ShuffleNet V2进行融合, 提升网络特征提取能力, 并利用融合后的网络对原始YOLOv5的骨干网络进行重构, 显著降低了网络的计算量和内存占用, 同时, 引入可变形卷积以提升网络的检测性能。道路监控图像和VOC、COCO数据集测试结果表明, 所提出的模型在保持检测精度的前提下, 将参数量和模型尺寸降低了90%, 计算量仅为原始模型的18%, 实现了检测模型的轻量化, 更有助于在计算资源有限和对实时性要求高的场景中部署。
中图分类号:
宋爽, 张悦, 张琳娜, 岑翼刚, 李浥东. 基于深度学习的轻量化目标检测算法[J]. 系统工程与电子技术, 2022, 44(9): 2716-2725.
Shuang SONG, Yue ZHANG, Linna ZHANG, Yigang CEN, Yidong LI. Lightweight target detection algorithm based on deep learning[J]. Systems Engineering and Electronics, 2022, 44(9): 2716-2725.
表2
VOC数据集实验结果"
模型 | 图像尺寸 | FLOPs/B | 模型大小/MB | mAP |
MobileNet-SSD[ | 300×300 | 1.15 | 13.2 | 0.680 |
Pelee-SSD | 304×304 | 2.4 | 21.68 | 0.709 |
本文(320×320) | 320×320 | 0.8 | 1.32 | 0.665 |
Tiny-YOLO | 416×416 | 5.52 | 33.4 | 0.584 |
YOLO-Nano[ | 416×416 | 4.57 | 4.0 | 0.691 |
ThunderNet_MM[ | 416×416 | - | 32.9 | 0.738 |
PP-YOLO[ | 416×416 | - | 269 | 0.843 |
本文(416×416) | 416×416 | 1.3 | 1.32 | 0.681 |
YOLOv5s | 512×512 | 10.9 | 13.73 | 0.852 |
本文(512×512) | 512×512 | 2.0 | 1.32 | 0.696 |
表3
COCO数据集实验结果"
模型 | 图像尺寸 | FLOPs/B | 模型大小/MB | APval |
PP-YOLO_MBV3_S | 320×320 | - | 16 | 0.172 |
PP-YOLO-Tiny | 416×416 | - | 4.2 | 0.227 |
YOLOX-Nano[ | 416×416 | 1.08 | 7.3 | 0.253 |
YOLOX-Tiny[ | 416×416 | 6.45 | 38.8 | 0.317 |
Tiny-YOLO | 416×416 | 5.52 | 33.4 | 0.166 |
YOLOv4-Tiny | 416×416 | 6.9 | 23.1 | 0.217 |
MM-YOLO-MBV2 | 416×416 | - | 14.5 | 0.239 |
CSL-YOLO[ | 416×416 | 1.47 | 14.6 | 0.245 |
YOLOv5s | 640×640 | 17.1 | 13.73 | 0.367 |
本文 | 416×416 | 1.3 | 1.32 | 0.231 |
1 | SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Proc. of the 31th AAAI Conference on Artificial Intelligence, 2017: 4278-4284. |
2 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. |
3 | KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[J]. Advances inNeural Information Processing Systems, 2012, 25 (2): 1097- 1105. |
4 |
RUSSAKOVSKY O , DENG J , SU H , et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y |
5 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1409.1556. |
6 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9. |
7 | HE K M, ZHANG X, REN S Q, et al. Identity mappings in deep residual networks[C]//Proc. of the European Conference on Computer Vision, 2016: 630-645. |
8 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. |
9 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proc. of the European Conference on Computer Vision, 2016: 21-37. |
10 | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1701.06659. |
11 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788. |
12 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271. |
13 | REDMON J, FARHADI A. Yolov3: an incremental improvement[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.02767. |
14 | REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2016, 39 (6): 1137- 1149. |
15 | DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proc. of the Advances in Neural Information Processing Systems, 2016: 379-387. |
16 | 张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58 (4): 438- 444. |
ZHANG X Y , GAO H B , ZHAO J H , et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University(Science and Technology), 2018, 58 (4): 438- 444. | |
17 | CHEN C Y, SEFF A, KORNHAUSER A, et al. Deepdriving: learning affordance for direct perception in autonomous driving[C]//Proc. of the IEEE International Conference on Computer Vision, 2015: 2722-2730. |
18 |
王云峰, 黎作鹏. 边缘环境中目标检测算法的应用研究[J]. 计算机工程与应用, 2021, 57 (16): 220- 227.
doi: 10.3778/j.issn.1002-8331.2008-0280 |
WANG Y F , LI Z P . Application research of target detection algorithm in edge environment[J]. Computer Engineering and Application, 2021, 57 (16): 220- 227.
doi: 10.3778/j.issn.1002-8331.2008-0280 |
|
19 |
谌颃, 孙道宗. 基于CS优化深度学习卷积神经网络的目标检测算法[J]. 机床与液压, 2020, 48 (6): 187- 192.
doi: 10.3969/j.issn.1001-3881.2020.06.028 |
CHEN H , SUN D Z . Target detection algorithm based on CS optimized deep learning convolutional neural network[J]. Machine Tool & Hydraulics, 2020, 48 (6): 187- 192.
doi: 10.3969/j.issn.1001-3881.2020.06.028 |
|
20 | HAN K, WANG Y H, TIAN Q, et al. Ghostnet: more features from cheap operations[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 1580-1589. |
21 | XIONG Y Y, LIU H X, GUPTA S, et al. Mobiledets: searching for object detection architectures for mobile accelerators[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 3825-3834. |
22 | WU B C, DAI X L, ZHANG P Z, et al. FBNet: hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10734-10742. |
23 | ZHANG X Y, ZHOU X Y, LIN M X, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6848-6856. |
24 | MA N N, ZHANG X Y, ZHENG H T, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design[C]//Proc. of the European Conference on Computer Vision, 2018: 116-131. |
25 | WANG R J, LI X, LING C X. Pelee: a real-time object detection system on mobile devices[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1804.06882. |
26 | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 764-773. |
27 | ZHU X Z, HU H, LIN S, et al. Deformable convnets v2: more deformable, better results[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 9308-9316. |
28 |
EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338.
doi: 10.1007/s11263-009-0275-4 |
29 | HAN S, MAO H Z, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1510.00149. |
30 | HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1506.02626. |
31 | LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//Proc. of the IEEE International Conference on Computer Vision, 2017: 2736-2744. |
32 | HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1503.02531. |
33 | LUO P, ZHU Z Y, LIU Z W, et al. Face model compression by distilling knowledge from neurons[C]//Proc. of the 30th AAAI Conference on Artificial Intelligence, 2016: 3560-3566. |
34 | WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: a new backbone that can enhance learning capability of CNN[C]//Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 390-391. |
35 |
HE K M , ZHANG X Y , REN S Q , et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Trans.on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916.
doi: 10.1109/TPAMI.2015.2389824 |
36 | CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. cuDNN: efficient primitives for deep learning[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1410.0759. |
37 | HOWARD A G, ZHU M L, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1704.04861. |
38 | SANDLER M, HOWARD A, ZHU M L, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C]//Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520. |
39 | QIN Z, LI Z M, ZHANG Z N, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]//Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 6718-6727. |
40 |
HUANG Z C , WANG J L , FU X , et al. DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection[J]. Information Sciences, 2020, 522, 241- 258.
doi: 10.1016/j.ins.2020.02.067 |
41 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2004.10934. |
42 | WONG A, FAMUORI M, SHAFIEE M J, et al. YOLO nano: a highly compact you only look once convolutional neural network for object detection[EB/OL]. [2021-11-10]. https://arxiv.org/abs/1910.01271. |
43 | LONG X, DENG K P, WANG G D, et al. PP-YOLO: an effective and efficient implementation of object detector[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2007.12099. |
44 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proc. of the European Conference on Computer Vision, 2014: 740-755. |
45 | GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.08430. |
46 | ZHANG Y M, LEE C C, HSIEH J W, et al. CSL-YOLO: a new lightweight object detection system for edge computing[EB/OL]. [2021-11-10]. https://arxiv.org/abs/2107.04829. |
[1] | 肖宇, 邓正宏, 张展. 基于双阶段互信息准则的多目标检测波形设计[J]. 系统工程与电子技术, 2022, 44(9): 2736-2742. |
[2] | 王彩云, 吴钇达, 王佳宁, 马璐, 赵焕玥. 基于改进的CNN和数据增强的SAR目标识别[J]. 系统工程与电子技术, 2022, 44(8): 2483-2487. |
[3] | 刘祥, 黄天耀, 刘一民. 频率捷变雷达的扩展目标检测[J]. 系统工程与电子技术, 2022, 44(6): 1833-1838. |
[4] | 韦娟, 杨皇卫, 宁方立. 基于NMF与CNN联合优化的声学场景分类[J]. 系统工程与电子技术, 2022, 44(5): 1433-1438. |
[5] | 赵晓枫, 徐叶斌, 吴飞, 牛家辉, 蔡伟, 张志利. 基于全局感知机制的地面红外目标检测方法[J]. 系统工程与电子技术, 2022, 44(5): 1461-1467. |
[6] | 魏文晓, 刘洁瑜, 沈强, 李成. 基于人眼视点图的特征融合小目标检测算法[J]. 系统工程与电子技术, 2022, 44(4): 1120-1127. |
[7] | 陈冬, 句彦伟. 基于语义分割实现的SAR图像舰船目标检测[J]. 系统工程与电子技术, 2022, 44(4): 1195-1201. |
[8] | 方伟, 王玉, 闫文君, 林冲. 基于神经网络的符号化飞行动作识别[J]. 系统工程与电子技术, 2022, 44(3): 737-745. |
[9] | 李洪瑶, 李小强, 韩心中, 谢学立, 席建祥. 基于决策融合的多无人机协同目标检测识别算法[J]. 系统工程与电子技术, 2022, 44(3): 746-754. |
[10] | 孙晶明, 虞盛康, 孙俊. 基于深度学习的HRRP识别姿态敏感性分析[J]. 系统工程与电子技术, 2022, 44(3): 802-807. |
[11] | 刘恒燕, 张立民, 闫文君, 钟兆根, 凌青, 梁晓军. 基于WBP-CNN算法的LDPC译码[J]. 系统工程与电子技术, 2022, 44(3): 1030-1035. |
[12] | 邵凯, 朱苗苗, 王光宇. 基于生成对抗与卷积神经网络的调制识别方法[J]. 系统工程与电子技术, 2022, 44(3): 1036-1043. |
[13] | 张玺, 金正猛, 姜亚琴. 融合深度图像先验的全变差图像着色算法[J]. 系统工程与电子技术, 2022, 44(2): 385-393. |
[14] | 金涛, 王晓峰, 田润澜, 张歆东. 基于改进1DCNN+TCN的雷达辐射源快速识别方法[J]. 系统工程与电子技术, 2022, 44(2): 463-469. |
[15] | 吕勤哲, 全英汇, 沙明辉, 董淑仙, 邢孟道. 基于集成深度学习的有源干扰智能分类[J]. 系统工程与电子技术, 2022, 44(12): 3595-3602. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||