Object detection algorithm combined with optimized feature extraction structure

doi:10.11772/j.issn.1001-9081.2021122122

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3558-3563.DOI: 10.11772/j.issn.1001-9081.2021122122

• ChinaVR 2021 • Previous Articles

Object detection algorithm combined with optimized feature extraction structure

Nan XIANG(), Chuanzhong PAN, Gaoxiang YU

Liangjiang International College，Chongqing University of Technology，Chongqing 401135，China

Received:2021-12-17 Revised:2022-02-13 Accepted:2022-02-14 Online:2022-03-02 Published:2022-11-10
Contact: Nan XIANG
About author:XIANG Nan， born in 1984， Ph. D.， associate professor. His research interests include affective computing， social computing， object detection.
PAN Chuanzhong， born in 1995， M. S. candidate. His research interests include object detection.
YU Gaoxiang， born in 1995， M. S. candidate. His research interests include object detection.
Supported by:
National Natural Science Foundation of China(61872051);Science and Technology Research Program of Chongqing Municipal Education Commission(KJQN202001118);Application Research Project of Banan Science and Technology Commission(2018TJ02)

融合优化特征提取结构的目标检测算法

向南(), 潘传忠, 虞高翔

重庆理工大学两江国际学院，重庆 401135

通讯作者: 向南
作者简介:向南（1984—），男，陕西旬阳人，副教授，博士，CCF会员，主要研究方向：情感计算、社交计算、目标检测 xiangnan@cqut.edu.cn
潘传忠（1995—），男，湖北咸宁人，硕士研究生，主要研究方向：目标检测
虞高翔（1995—），男，江西上饶人，硕士研究生，主要研究方向：目标检测。
基金资助:
国家自然科学基金资助项目(61872051);重庆市教委科学技术研究计划项目(KJQN202001118);巴南区科委应用研究项目(2018TJ02)

Abstract

Abstract:

Concerning the problem of low object detection precision of DEtection TRansformer （DETR） for small targets， an object detection algorithm with optimized feature extraction structure， called CF?DETR （DETR combined CSP?Darknet53 and Feature pyramid network）， was proposed on the basis of DETR. Firstly， CSP?Darknet53 combined with the optimized Cross Stage Partial （CSP） network was used to extract the features of the original image， and feature maps of 4 scales were output. Secondly， the Feature Pyramid Network （FPN） was used to splice and fuse the 4 scale feature maps after down?sampling and up?sampling， and output a 52×52 size feature map. Finally， the obtained feature map and the location coding information were combined and input into the Transformer to obtain the feature sequence. Through the Forward Feedback Networks （FFNs） as the prediction head， the category and location information of the prediction object was output. On COCO2017 dataset， compared with DETR， CF?DETR has the number of model hyperparameters reduced by 2×10⁶， the average detection precision of small objects improved by 2.1 percentage points， and the average detection precision of medium? and large?sized objects improved by 2.3 percentage points. Experimental results show that the optimized feature extraction structure can effectively improve the DETR detection precision while reducing the number of model hyperparameters.

Key words: object detection, samll target, DEtection TRansformer (DETR) algorithm, feature extraction, Cross Stage Partial (CSP) network, Feature Pyramid Network (FPN), Transformer

摘要：

针对DETR对小目标的检测精度低的问题，基于DETR提出一种优化特征提取结构的目标检测算法——CF?DETR。首先通过结合了优化跨阶段部分（CSP）网络的CSP?Darknet53对原始图进行特征提取并输出4种尺度的特征图；其次利用特征金字塔网络（FPN）对4种尺度特征图进行下采样和上采样后进行拼接融合，并输出52×52尺寸的特征图；最后将该特征图与位置编码信息结合输入Transformer后得到特征序列，输入到作为预测头的前向反馈网络后输出预测目标的类别与位置信息。在COCO2017数据集上，与DETR相比，CF?DETR的模型的超参数量减少了2×10⁶，在小目标上的平均检测精度提高2.1个百分点，在中、大尺寸目标上的平均检测精度提高了2.3个百分点。实验结果表明，优化特征提取结构能够在降低模型超参数量的同时有效提高DETR的检测精度。

关键词: 目标检测, 小目标, DETR算法, 特征提取, 跨阶段部分网络, 特征金字塔网络, Transformer

CLC Number:

TP391.41

Nan XIANG, Chuanzhong PAN, Gaoxiang YU. Object detection algorithm combined with optimized feature extraction structure[J]. Journal of Computer Applications, 2022, 42(11): 3558-3563.

向南, 潘传忠, 虞高翔. 融合优化特征提取结构的目标检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3558-3563.

Figures/Tables 7

References 19

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
2	CARION N， MASSA F， SYNNAEVE G， et al. End‑to‑end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
3	WANG C Y， LIAO H Y M， WU Y H， et al. CSPNet： a new backbone that can enhance learning capability of CNN［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 1571-1580. 10.1109/cvprw50498.2020.00203
4	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
5	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
6	CAI Z W， VASCONCELOS N. Cascade R‑CNN： delving into high quality object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6154-6162. 10.1109/cvpr.2018.00644
7	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real‑time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
8	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
9	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-09-23］. . 10.1109/cvpr.2017.690
10	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
11	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
12	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2021-09-08］. .
13	LIU S， QI L， QIN H F， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
14	江金洪，鲍胜利，史文旭，等. 基于YOLO v3算法改进的交通标志识别算法［J］. 计算机应用， 2020， 40（8）： 2472-2478. 10.11772/j.issn.1001-9081.2020010062
	JIANG J H， BAO S L， SHI W X， et.al. Improved traffic sign recognition algorithm based on YOLO v3 algorithm［J］. Journal of Computer Applications， 2020， 40（8）： 2472-2478. 10.11772/j.issn.1001-9081.2020010062
15	徐利锋，黄海帆，丁维龙，等. 基于改进DenseNet的水果小目标检测［J］. 浙江大学学报（工学版）， 2021， 55（2）：377-385. 10.3785/j.issn.1008-973X.2021.02.018
	XU L F， HUANG H F， DING W L， et al. Detection of small fruit target based on improved DenseNet［J］. Journal of Zhejiang University （Engineering Science）， 2021， 55（2）： 377-385. 10.3785/j.issn.1008-973X.2021.02.018
16	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
17	REN S Q， HE K M， GIRSHICK R， et al. Faster R‑CNN： towards real‑time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
18	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
19	REZATOFIGHI H， TSOI N， GWAK J， et al. Generalized intersection over union： a metric and a loss for bounding box regression［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 658-666. 10.1109/cvpr.2019.00075

模型	R_AP	AP₅₀	AP_S	AP_M	AP_L
DETR‑R50‑DC5	29.9	50.5	10.6	31.8	46.8
DETR‑R50	28.6	49.0	9.1	30.5	46.5
DETR‑R101	30.0	51.7	10.2	32.1	48.4
DETR‑Dn53‑FPN	31.3	51.7	11.5	33.8	48.8
CF‑DETR	30.2	51.3	11.2	32.8	48.8

模型	R_AP	AP₅₀	AP_S	AP_M	AP_L
DETR‑R50‑DC5	29.9	50.5	10.6	31.8	46.8
DETR‑R50	28.6	49.0	9.1	30.5	46.5
DETR‑R101	30.0	51.7	10.2	32.1	48.4
DETR‑Dn53‑FPN	31.3	51.7	11.5	33.8	48.8
CF‑DETR	30.2	51.3	11.2	32.8	48.8

[1]	Guanyou XU, Weisen FENG. Python named entity recognition model based on transformer [J]. Journal of Computer Applications, 2022, 42(9): 2693-2700.
[2]	Jinghan YIN, Shaojun QU, Zekai YAO, Xuanye HU, Xiaoyu QIN, Pujing HUA. Traffic sign recognition model in haze weather based on YOLOv5 [J]. Journal of Computer Applications, 2022, 42(9): 2876-2884.
[3]	Jiehang DENG, Wenquan GUO, Hanjie CHEN, Guosheng GU, Jingjian LIU, Yukun DU, Chao LIU, Xiaodong KANG, Jian ZHAO. Few-shot diatom detection combining multi-scale multi-head self-attention and online hard example mining [J]. Journal of Computer Applications, 2022, 42(8): 2593-2600.
[4]	Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3 [J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.
[5]	Xinyu ZHANG, Sheng DING, Zhipei YANG. Traffic sign detection algorithm based on improved attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2378-2385.
[6]	Xianjie ZHANG, Zhiming ZHANG. Handwritten English text recognition based on convolutional neural network and Transformer [J]. Journal of Computer Applications, 2022, 42(8): 2394-2400.
[7]	Tingwei QIN, Pengcheng ZHAO, Pinle QIN, Jianchao ZENG, Rui CHAI, Yongqi HUANG. Point cloud registration algorithm based on residual attention mechanism [J]. Journal of Computer Applications, 2022, 42(7): 2184-2191.
[8]	Xiangyue TAN, Xiao HU, Jiaxin YANG, Junjiang XIANG. Camouflaged object detection based on progressive feature enhancement aggregation [J]. Journal of Computer Applications, 2022, 42(7): 2192-2200.
[9]	Zhifeng ZHONG, Yifan XIA, Dongping ZHOU, Yangtian YAN. Lightweight object detection algorithm based on improved YOLOv4 [J]. Journal of Computer Applications, 2022, 42(7): 2201-2209.
[10]	Tianhao QIU, Shurong CHEN. EfficientNet based dual-branch multi-scale integrated learning for pedestrian re-identification [J]. Journal of Computer Applications, 2022, 42(7): 2065-2071.
[11]	Dawei ZHANG, Xuchong LIU, Wei ZHOU, Zhuhui CHEN, Yao YU. Real-time traffic sign detection algorithm based on improved YOLOv3 [J]. Journal of Computer Applications, 2022, 42(7): 2219-2226.
[12]	Zhipei YANG, Sheng DING, Li ZHANG, Xinyu ZHANG. Anchor-free remote sensing image detection method for dense objects with rotation [J]. Journal of Computer Applications, 2022, 42(6): 1965-1971.
[13]	Xianfeng YANG, Jiahe ZHAO, Ziqiang LI. Text classification model combining word annotations [J]. Journal of Computer Applications, 2022, 42(5): 1317-1323.
[14]	Jin ZHANG, Peiqi QU, Cheng SUN, Meng LUO. Safety helmet wearing detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2022, 42(4): 1292-1300.
[15]	Xingshuo DING, Xiang LI, Qian XIE. Enterprise portrait construction method based on label layering and deepening modeling [J]. Journal of Computer Applications, 2022, 42(4): 1170-1177.

Object detection algorithm combined with optimized feature extraction structure

融合优化特征提取结构的目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 19

Related Articles 15

Recommended Articles

Metrics