基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法

doi:10.11772/j.issn.1001-9081.2022081249

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2727-2734.DOI: 10.11772/j.issn.1001-9081.2022081249

基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法

杨昊, 张轶()

四川大学计算机学院，成都 610065

收稿日期:2022-08-23 修回日期:2022-10-22 接受日期:2022-11-03 发布日期:2023-01-11 出版日期:2023-09-10
通讯作者: 张轶
作者简介:杨昊（1999—），男，四川雅安人，硕士研究生，主要研究方向：计算机视觉、目标检测；
基金资助:
国家自然科学基金资助项目(U20A20161)

Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness

Hao YANG, Yi ZHANG()

College of Computer Science，Sichuan University，Chengdu Sichuan 610065，China

Received:2022-08-23 Revised:2022-10-22 Accepted:2022-11-03 Online:2023-01-11 Published:2023-09-10
Contact: Yi ZHANG
About author:YANG Hao， born in 1999， M. S. candidate. His research interests include computer vision， object detection.
Supported by:
National Natural Science Foundation of China(U20A20161)

摘要/Abstract

摘要：

针对目标检测中分类和定位子任务分别需要大感受野和高分辨率，难以在这两个相互矛盾的需求间取得平衡的问题，提出一种用于目标检测的基于注意力机制的特征金字塔网络算法。该算法能整合多个不同感受野来获取更丰富的语义信息，以一种更关注不同特征图重要性的方式融合多尺度特征图，并在注意力机制引导下进一步精练复杂融合后的特征图。首先，通过多尺度的空洞卷积获取多尺度感受野，在保留分辨率的同时增强语义信息；其次，通过多级特征融合（MLF）方式将多个不同尺度的特征图通过上采样或池化操作变为相同分辨率后融合；最后，利用注意力引导的特征精练模块（AFRM）对融合后的特征图作精练处理，丰富语义信息并消除融合带来的混叠效应。将所提特征金字塔替换Faster R-CNN中的特征金字塔网络（FPN）后在MS COCO 2017数据集上进行实验，结果表明当骨干网络为深度50和101的残差网络（ResNet）时，平均精度（AP）分别达到了39.2%和41.0%，与使用原FPN的Faster R-CNN相比，分别提高了1.4和1.0个百分点。可见，所提特征金字塔网络算法能替代原FPN，更好地应用在目标检测场景中。

关键词: 特征金字塔, 目标检测, 上下文信息, 多尺度特征融合, 注意力机制

Abstract:

Aiming at the problem that the classification and localization sub-tasks in object detection require large receptive field and high resolution respectively， and it is difficult to achieve a balance between these two contradictory requirements， a feature pyramid network algorithm based on attention mechanism for object detection was proposed. In the algorithm， multiple different receptive fields were integrated to obtain richer semantic information， multi-scale feature maps were fused in the way of paying more attention to the importance of different feature maps， and the fused feature maps were further refined under the guidance of the attention mechanism. Firstly， multi-scale receptive fields were obtained through multiple atrous convolutions with different dilation rates， which enhanced the semantic information with the preservation of the resolution. Secondly， through the Multi-Level Fusion （MLF）， multiple feature maps of different scales were fused after changing to the same resolution through upsampling or pooling operations. Finally， the proposed Attention-guided Feature Refinement Module （AFRM） was used to refine the fused feature maps to enhance semantic information and eliminate the aliasing effect caused by fusion. After replacing the Feature Pyramid Network （FPN） in Faster R-CNN with the proposed feature pyramid， experiments were performed on MS COCO 2017 dataset. The results show that when the backbone network is ResNet （Residual Network） with a depth of 50 and 101， with the use of the proposed algorithm， the Average Precision （AP） of the model reaches 39.2% and 41.0% respectively， which is 1.4 and 1.0 percentage points higher than that of Faster R-CNN using the original FPN， respectively. It can be seen that the proposed feature pyramid network algorithm can replace the original feature pyramid to be better applied in the object detection scenarios.

Key words: feature pyramid, object detection, context information, multi-scale feature fusion, attention mechanism

中图分类号:

TP391.4

杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 计算机应用, 2023, 43(9): 2727-2734.

Hao YANG, Yi ZHANG. Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness[J]. Journal of Computer Applications, 2023, 43(9): 2727-2734.

图/表 13

图1 本文算法的架构

Fig. 1 Architecture of the proposed algorithm

图2 CEM的结构

Fig. 2 Structure of CEM

图3 MLF的结构

Fig. 3 Structure of MLF

图4 AFRM的结构

Fig. 4 Structure of AFRM

图5 层级子模块的结构

Fig. 5 Structure of level sub-module

图6 空间子模块的结构

Fig. 6 Structure of spatial sub-module

图7 通道子模块的结构

Fig. 7 Structure of channel sub-module

表1 不同算法在COCO测试数据集上的平均精度对比 (%)

Tab. 1 Comparisons of average precisoin of different algorithms on COCO test set

算法	骨干网络	训练计划	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
YOLOv2^［34］	DarkNet-19	—	21.6	44.0	19.2	5.0	22.4	35.5
SSD	ResNet-101	—	31.2	50.4	33.3	10.2	34.5	49.8
RetinaNet	ResNet-101	—	39.1	59.1	42.3	21.8	42.7	50.2
FCOS	ResNet-101	—	41.5	60.7	45.0	24.4	44.8	51.6
CornerNet	Hourglass-104	—	40.5	56.5	43.1	19.4	42.7	53.9
Mask R-CNN	ResNet-101	—	38.2	60.3	41.7	20.1	41.1	50.2
Faster R-CNN	ResNet-101	—	36.2	59.1	42.3	21.8	42.7	50.2
	ResNet-50^*	1x	37.8	59.0	40.9	21.9	40.7	46.6
	ResNet-101^*	1x	40.0	61.0	43.4	22.8	43.3	50.2
Libra R-CNN	ResNet-50	1x	38.7	59.9	42.0	22.5	41.1	48.7
	ResNet-101	1x	40.3	61.3	43.9	22.9	43.1	51.0
	ResNet-101	2x	41.1	62.1	44.7	23.4	43.7	52.5
	ResNext101-64x4d	1x	43.0	64.0	47.0	25.3	45.6	54.6
AugFPN	ResNet-50	1x	38.8	61.5	42.0	23.3	42.1	47.7
	ResNet-101	1x	40.6	63.2	44.0	24.0	44.1	51.0
	ResNet-101	2x	41.5	63.9	45.1	23.8	44.7	52.8
	ResNext101-64x4d	1x	43.0	65.6	46.9	26.2	46.5	53.9
本文算法	ResNet-50	1x	39.2	61.3	42.3	22.8	42.0	49.1
	ResNet-101	1x	41.0	62.9	44.5	23.4	44.0	52.0
	ResNet-101	2x	41.2	62.6	44.6	22.9	44.3	52.8
	ResNext101-64x4d	1x	43.3	65.3	47.1	25.7	46.6	53.8

表2 三个核心模块的平均精度对比 (%)

Tab. 2 Comparison of average precision of three core modules

算法	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
基线	37.6	58.5	40.5	22.2	40.8	48.6
基线+CEM	38.6	60.2	41.5	22.9	42.1	49.9
基线+MLF+AFRM	38.4	60.3	41.4	22.8	42.5	48.9
基线+CEM+MLF+AFRM	38.9	60.7	42.1	23.2	42.5	50.1

表3 CEM的卷积数和空洞率对AP的影响 (%)

Tab. 3 Effects of convolution number and dilation rates in CEM on AP

$k$	空洞率	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
0	—	37.6	58.5	40.5	22.2	40.8	48.6
3	（3，6，9）	38.4	59.8	41.7	22.6	42.0	49.4
5	（3，6，9，12，15）	38.6	60.2	41.5	22.9	42.1	49.9
7	（3，6，9，12，15， 18，21）	38.4	60.2	41.6	22.3	42.2	49.9

表3 CEM的卷积数和空洞率对AP的影响 (%)

Tab. 3 Effects of convolution number and dilation rates in CEM on AP

$k$	空洞率	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
0	—	37.6	58.5	40.5	22.2	40.8	48.6
3	（3，6，9）	38.4	59.8	41.7	22.6	42.0	49.4
5	（3，6，9，12，15）	38.6	60.2	41.5	22.9	42.1	49.9
7	（3，6，9，12，15， 18，21）	38.4	60.2	41.6	22.3	42.2	49.9

表4 AFRM逐渐增加各个子模块的结果 (%)

Tab. 4 Results of gradually adding sub-modules on AFRM

算法	AP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
基线	37.6	58.5	40.5	22.2	40.8	48.6
AFRM-L	37.8	59.0	40.7	22.2	41.6	48.5
AFRM-S	37.8	58.9	41.0	22.0	41.4	49.0
AFRM-C	38.1	59.8	41.4	22.7	41.8	48.6
AFRM-L+S	38.3	60.1	41.2	22.6	42.2	48.5
AFRM-L+S+C	38.4	60.3	41.4	22.8	42.5	48.9

图8 本文算法与基线算法的检测结果对比

Fig. 8 Comparison of detection results of the proposed algorithm and baseline

表5 检测结果、模型复杂度和帧率对比

Tab. 5 Comparison of detection results， model complexity and frame rate

算法	骨干网络	参数规模/MB	浮点运算量/GFLOPs	AP/%	帧率/（frame·s^-1）
FPN	ResNet-50	41.53	207.07	37.6	18.0
FPN	ResNet-101	60.52	283.14	39.4	12.3
本文算法	ResNet-50	55.97	222.30	38.9	14.7
本文算法	ResNet-101	74.96	298.37	40.5	10.0

参考文献 39

1	ZHAO Z Q， ZHENG P， XU S T， et al. Object detection with deep learning： a review［J］. IEEE Transactions on Neural Networks and Learning Systems， 2019， 30（11）： 3212-3232. 10.1109/tnnls.2018.2876865
2	PAL S K， PRAMANIK A， MAITI J， et al. Deep learning in multi-object detection and tracking： state of the art［J］. Applied Intelligence， 2021， 51（9）： 6400-6429. 10.1007/s10489-021-02293-7
3	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
4	胡嵽，冯子亮. 基于深度学习的轻量级道路图像语义分割算法［J］. 计算机应用， 2021， 41（5）：1326-1331. 10.11772/j.issn.1001-9081.2020081181
	HU D， FENG Z L. Light-weight road image semantic segmentation algorithm based on deep learning［J］. Journal of Computer Applications， 2021， 41（5）： 1326-1331. 10.11772/j.issn.1001-9081.2020081181
5	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
6	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
7	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
8	TIAN Z， SHEN C H， CHEN H， et al. FCOS： fully convolutional one-stage object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9626-9635. 10.1109/iccv.2019.00972
9	LIU S， QI L， QIN H F， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
10	PANG J M， CHEN K， SHI J P， et al. Libra R-CNN： towards balanced learning for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 821-830. 10.1109/cvpr.2019.00091
11	GUO C X， FAN B， ZHANG Q， et al. AugFPN： improving multi-scale feature learning for object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 12592-12601. 10.1109/cvpr42600.2020.01261
12	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
13	UIJLINGS J R R， K E A van de SANDE， GEVERS T， et al. Selective search for object recognition［J］. International Journal of Computer Vision， 2013， 104（2）： 154-171. 10.1007/s11263-013-0620-5
14	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
15	CAI Z W， VASCONCELOS N. Cascade R-CNN： delving into high quality object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6154-6162. 10.1109/cvpr.2018.00644
16	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
17	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
18	KONG T， SUN F C， LIU H P， et al. FoveaBox： beyond anchor-based object detection［J］. IEEE Transactions on Image Processing， 2020， 29： 7389-7398. 10.1109/tip.2020.3002345
19	LAW H， DENG J. CornerNet： detecting objects as paired keypoints［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 765-781.
20	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 40（4）： 834-848. 10.1109/tpami.2017.2699184
21	CHEN Q， WANG Y M， YANG T， et al. You only look one-level feature［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13034-13043. 10.1109/cvpr46437.2021.01284
22	CHEN Z， HUANG S L， TAO D C. Context refinement for object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11212. Cham： Springer， 2018： 74-89.
23	钟磊，何一，张建伟. 基于环境上下文和语义特征融合的小目标检测算法［J］. 计算机应用， 2022， 42（S1）：281-286.
	ZHONG L， HE Y， ZHANG J W. Small object detection algorithm based on context and semantic feature fusion［J］. Journal of Computer Applications， 2022， 42（S1）：281-286.
24	WANG X L， GIRSHICK R， GUPTA A， et al. Non-local neural networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803. 10.1109/cvpr.2018.00813
25	LI J N， WEI Y C， LIANG X D， et al. Attentive contexts for object detection［J］. IEEE Transactions on Multimedia， 2017， 19（5）： 944-954. 10.1109/tmm.2016.2642789
26	GHIASI G， LIN T Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7029-7038. 10.1109/cvpr.2019.00720
27	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787. 10.1109/cvpr42600.2020.01079
28	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing System. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
29	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
30	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
31	DAI X Y， CHEN Y P， XIAO B， et al. Dynamic head： unifying object detection heads with attentions［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 7369-7378. 10.1109/cvpr46437.2021.00729
32	SUNG F， YANG Y X， ZHANG L， et al. Learning to compare： relation network for few-shot learning［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1199-1208. 10.1109/cvpr.2018.00131
33	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
34	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
35	CHEN K， WANG J Q， PANG J M， et al. MMDetection： open MMLab detection toolbox and benchmark［EB/OL］. （2019-06-17）［2022-06-20］..
36	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
37	XIE S N， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995. 10.1109/cvpr.2017.634
38	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
39	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322

[1]	杨君宇, 董岩, 龙镇南, 杨新, 韩斌. 基于事件相机的雨滴检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2904-2909.
[2]	张秋余, 温永旺. 用于语音检索的三联体深度哈希方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2910-2918.
[3]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[4]	王宏, 钱清, 王欢, 龙永. 融合大核注意力卷积的轻量化图像篡改定位算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2692-2699.
[5]	李众, 王雅婧, 马巧梅. 基于空洞卷积的医学图像超分辨率重建算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2940-2947.
[6]	段升位, 程欣宇, 王浩舟, 王飞. 基于改进的YOLOv5的大坝表面病害检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2619-2629.
[7]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[8]	刘源, 董永权, 贾瑞, 杨昊霖. 面向个性化课程推荐的分层分期注意力网络模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2358-2363.
[9]	崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405.
[10]	齐爱玲, 王宣淋. 基于中层细微特征提取与多尺度特征融合细粒度图像识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2556-2563.
[11]	金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.
[12]	王静红, 周志霞, 王辉, 李昊康. 双路自编码器的属性网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2338-2344.
[13]	姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329.
[14]	梁敏, 刘佳艺, 李杰. 融合迭代反馈与注意力机制的图像超分辨重建方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2280-2287.
[15]	周静, 胡怡宇, 胡成玉, 王天江. 基于点云补全和多分辨Transformer的弱感知目标检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2155-2165.

基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法

Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 39

相关文章 15

编辑推荐

Metrics