基于改进区域提议网络和特征聚合小样本目标检测方法

doi:10.11772/j.issn.1001-9081.2023121731

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (12): 3790-3797.DOI: 10.11772/j.issn.1001-9081.2023121731

基于改进区域提议网络和特征聚合小样本目标检测方法

付可意¹, 王高才¹, 邬满¹^,²^,³()

^1.广西大学计算机与电子信息学院，南宁 530004
^2.广西近海海洋环境科学重点实验室（广西科学院），南宁 530007
^3.广西壮族自治区北部湾碳汇与低碳工程研究中心（广西科学院），南宁 530007

收稿日期:2023-12-18 修回日期:2024-02-14 接受日期:2024-02-28 发布日期:2024-03-21 出版日期:2024-12-10
通讯作者: 邬满
作者简介:付可意（2000—），女，湖南衡阳人，硕士研究生，主要研究方向：小样本目标检测、小样本学习
王高才（1976—），男，广西桂林人，教授，博士，CCF会员，主要研究方向：计算机网络、系统性能评价、随机算法；
基金资助:
国家重点研发计划重点专项(2022YFD2401200);广西科技重大专项(桂科AA22068072)

Few-shot object detection method based on improved region proposal network and feature aggregation

Keyi FU¹, Gaocai WANG¹, Man WU¹^,²^,³()

^1.School of Computer，Electronic and Information，Guangxi University，Nanning Guangxi 530004，China
^2.Guangxi Key Laboratory of Marine Environmental Science （Guangxi Academy of Sciences），Nanning Guangxi 530007，China
^3.Research Center for Carbon Sink and Low?Carbon Engineering in the Beibu Gulf of Guangxi （Guangxi Academy of Sciences），Nanning Guangxi 530007，China

Received:2023-12-18 Revised:2024-02-14 Accepted:2024-02-28 Online:2024-03-21 Published:2024-12-10
Contact: Man WU
About author:FU Keyi， born in 2000， M. S. candidate. Her research interests include few-shot object detection， few-shot learning.
WANG Gaocai， born in 1976， Ph. D.， processor. His research interests include computer network， system performance evaluation， randomized algorithm.
Supported by:
Key Project of National Key Research and Development Program of China(2022YFD2401200);Guangxi Science and Technology Major Project(Guike AA22068072)

摘要/Abstract

摘要：

在现有的小样本目标检测中，区域提议网络（RPN）通常是在基类数据上训练以生成新类候选框；然而新类数据相较于基类更稀缺，在引入时可能产生与目标物不同的复杂背景，导致RPN将背景误认为前景，遗漏高交并比（IoU）值候选框。针对上述问题，提出一种基于改进RPN和特征聚合小样本目标检测方法（IFA-FSOD）。首先，基于RPN进行改进，即通过在RPN中设计一个基于度量的非线性分类器，计算骨干网络提取的特征和新类特征之间的相似度，以提高对新类候选框的召回率，从而筛选高IoU候选框；其次，在感兴趣区域对齐（RoI Align）中引入基于注意力机制的特征聚合模块（FAM），并通过设计不同尺度的网格，获取更全面的信息和特征表示，从而缓解因尺度不同引起的特征信息缺失。实验结果表明，相较于QA-FewDet（Query Adaptive Few-shot object Detection）方法，IFA-FSOD方法在PASCAL VOC数据集的新类上的Novel Set 3中的10-shot下的新类别平均精度（50% IoU）（nAP50）提升了4.5个百分点；相较于FsDetView（Few-shot object Detection and Viewpoint estimation）方法，在10-shot和30-shot设置下，IFA-FSOD方法在COCO数据集的新类上的平均精度均值（mAP）分别提升了0.2和0.8个百分点。可见改进RPN和特征聚合（IFA）能有效提高在小样本情况下对目标类别的检测性能，并解决高IoU值候选框遗漏和特征信息捕捉不全的问题。

关键词: 小样本目标检测, 基于度量, 区域提议网络, 非线性分类器, 特征聚合

Abstract:

In the existing few-shot object detection， Region Proposal Network （RPN） is usually trained on base class data to generate new class anchor boxes. However， new class data are more sparse compared to the base class. Introducing new class data may lead to the presence of complex backgrounds different to the objects， causing RPN to misclassify the background as foreground， resulting in the omission of high Intersection over Union （IoU） value anchor boxes. To address the above issues， a Few-Shot Object Detection method based on Improved RPN and Feature Aggregation （IFA-FSOD） was proposed. Firstly， an improvement was made on the basis of RPN by incorporating a metric-based non-linear classifier within RPN. This classifier was designed to compute the similarity between features extracted by the backbone network and the features representing the new class， so as to increase the recall for anchor boxes of the new class， thereby filtering out high IoU value anchor boxes. Then， a Feature Aggregation Module （FAM） based on attention mechanism was introduced in Region of Interest Alignment （RoI Align）. And by designing grids of different scales， more comprehensive information and feature representation were obtained， which alleviated the lack of feature information caused by different scales. Experimental results show that compared with QA-FewDet （Query Adaptive Few-shot object Detection） method， IFA-FSOD method improves nAP50（Novel Average Precision at 50% IoU） by 4.5 percentage points under Novel Set 3's 10-shot on the new class of PASCAL VOC dataset； compared with FsDetView （Few-shot object Detection and Viewpoint estimation） method， under the settings of 10-shot and 30-shot， IFA-FSOD method has mean Average Precision （mAP） increased by 0.2 and 0.8 percentage points， respectively， on the new class of COCO dataset. It can be seen that Improved RPN and Feature Aggregation （IFA） can improve the detection performance of object classes in the case of few-shot effectively， and solve the problem of missing high IoU value anchor boxes and incomplete feature information capture.

Key words: Few-Shot Object Detection (FSOD), metric-based, Region Proposal Network (RPN), non-linear classifier, feature aggregation

中图分类号:

TP391.41

付可意, 王高才, 邬满. 基于改进区域提议网络和特征聚合小样本目标检测方法[J]. 计算机应用, 2024, 44(12): 3790-3797.

Keyi FU, Gaocai WANG, Man WU. Few-shot object detection method based on improved region proposal network and feature aggregation[J]. Journal of Computer Applications, 2024, 44(12): 3790-3797.

图/表 13

参考文献 34

1	史燕燕，史殿习，乔子腾，等. 小样本目标检测研究综述［J］. 计算机学报， 2023， 46（8）：1753-1780.
	SHI Y Y， SHI D X， QIAO Z T， et al. A survey on recent advances in few-shot object detection［J］. Chinese Journal of Computers， 2023， 46（8）：1753-1780.
2	黄友文，豆恒，肖贵光. 融合分类校正与样本扩增的小样本目标检测［J］. 计算机工程与应用， 2024， 60（1）：254-262.
	HUANG Y W， DOU H， XIAO G G. Few-shot object detection based on fusion of classification correction and sample amplification［J］. Computer Engineering and Applications， 2024， 60（1）： 254-262.
3	KÖHLER M， EISENBACH M， GROSS H M. Few-shot object detection： a comprehensive survey［J］. IEEE Transactions on Neural Networks and Learning Systems， 2024， 35（9）：11958-11978.
4	KANG B， LIU Z， WANG X， et al. Few-shot object detection via feature reweighting［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8419-8428.
5	WANG X， HUANG T E， DARRELL T， et al. Frustratingly simple few-shot object detection ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 9919-9928.
6	YAN X， CHEN Z， XU A， et al. Meta R-CNN： towards general solver for instance-level low-shot learning ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9576-9585.
7	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）：1137-1149.
8	HU H， BAI S， LI A， et al. Dense relation distillation with context-aware aggregation for few-shot object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10180-10189.
9	HAN G， HUANG S， MA J， et al. Meta Faster R-CNN： towards accurate few-shot object detection with attentive feature alignment［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 780-789.
10	LE JEUNE P， MOKRAOUI A. A comparative attention framework for better few-shot object detection on aerial images［EB/OL］. ［2023-11-10］. .
11	XIAO Y， LEPETIT V， MARLET R. Few-shot object detection and viewpoint estimation for objects in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（3）： 3090-3106.
12	WANG Y X， RAMANAN D， HEBERT M. Meta-learning to detect rare objects ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9924-9933.
13	LI Y， FENG W， LYU S， et al. Feature reconstruction and metric based network for few-shot object detection ［J］. Computer Vision and Image Understanding， 2023， 227： No.103600.
14	LI A， LI Z. Transformation invariant few-shot object detection［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 3093-3101.
15	LI B， YANG B， LIU C， et al. Beyond max-margin： class margin equilibrium for few-shot object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 7359-7368.
16	HAN G， MA J， HUANG S， et al. Few-shot object detection with fully cross-transformer ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 5311-5320.
17	FAN Q， ZHUO W， TANG C K， et al. Few-shot object detection with Attention-RPN and Multi-Relation Detector ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 4012-4021.
18	ZHANG L， ZHOU S， GUAN J， et al. Accurate few-shot object detection with support-query mutual guidance and hybrid loss ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 14419-14427.
19	HAN G， HE Y， HUANG S， et al. Query adaptive few-shot object detection with heterogeneous graph convolutional networks［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 3243-3252.
20	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
21	FAN Z， MA Y， LI Z， et al. Generalized few-shot object detection without forgetting［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 4525-4534.
22	WU A， HAN Y， ZHU L， et al. Universal-prototype enhancing for few-shot object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9547-9556.
23	HAN J， REN Y， DING J， et al. Few-shot object detection via variational feature aggregation ［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 755-763.
24	SUN B， LI B， CAI S， et al. FSCE： few-shot object detection via contrastive proposal encoding ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 7348-7358.
25	ZHU C， CHEN F， AHMED U， et al. Semantic relation reasoning for shot-stable few-shot object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 8778-8787.
26	QIAO L， ZHAO Y， LI Z， et al. DeFRCN： decoupled Faster R-CNN for few-shot object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 8661-8670.
27	KAUL P， XIE W， ZISSERMAN A. Label， verify， correct： a simple few shot object detection method ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 14217-14227.
28	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
29	KRIZHEVSKY A， SUTSKEVER I， HINTON G. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems — Volume 1. Red Hook： Curran Associates Inc.， 2012： 1097-1105.
30	EVERINGHAM M， VAN GOOL L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge［J］. International Journal of Computer Vision， 2010， 88（2）： 303-308.
31	EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The PASCAL visual object classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）： 98-136.
32	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014：740-755.
33	YAO J， SHI T Y， CHE X P， et al. DA-FSOD： a novel data augmentation scheme for few-shot object detection［J］. IEEE Access， 2023， 11： 92100-92110.
34	WU J， LIU S， HUANG D， et al. Multi-scale positive sample refinement for few-shot object detection［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12361. Cham： Springer， 2020： 456-472.

方法	Novel Set 1					Novel Set 2					Novel Set 3
方法	1-shot	2-shot	3-shot	5-shot	10-shot	1-shot	2-shot	3-shot	5-shot	10-shot	1-shot	2-shot	3-shot	5-shot	10-shot
FSRW^［4］	14.8	15.5	26.7	33.9	47.2	15.7	15.3	22.7	30.1	40.5	21.3	25.6	28.4	42.8	45.9
MetaR-CNN^［6］	19.9	25.5	35.0	45.7	51.5	10.4	19.4	29.6	34.8	45.4	14.3	18.2	27.5	41.2	48.1
MetaDet^［12］	18.9	20.6	30.2	36.8	49.6	21.8	23.1	27.8	31.7	43.0	20.6	23.9	29.4	43.9	44.1
TFA^［5］	39.8	36.1	44.7	55.7	56.0	23.5	26.9	34.1	35.1	39.1	30.8	34.8	42.8	49.5	49.8
SRR-FSD^［25］	47.8	50.5	51.3	55.2	56.8	32.5	35.3	39.1	40.8	43.8	40.1	41.5	44.3	46.9	46.4
QA-FewDet^［19］	42.4	51.9	55.7	62.6	63.4	25.9	37.8	46.6	48.9	51.1	35.2	42.9	47.8	54.8	53.5
DA-FSOD^［33］	33.4	45.1	47.1	53.1	60.0	24.2	31.4	39.5	43.9	49.0	24.5	36.1	42.3	49.2	54.5
FSCE^［24］	32.9	44.0	46.8	52.9	59.7	23.7	30.6	38.4	38.4	48.5	22.6	33.4	39.5	47.3	54.0
G-FSD^［21］	42.4	45.8	45.9	53.7	56.1	21.7	27.8	35.2	37.0	40.3	30.2	37.6	43.0	49.7	50.1
FSOD-UP^［22］	43.8	47.8	50.3	55.4	61.7	31.2	30.5	41.2	42.2	48.3	35.5	39.7	43.9	50.6	53.5
DCNet^［8］	33.9	37.4	43.7	51.1	59.6	23.2	24.8	30.6	36.7	46.6	32.3	34.9	39.7	42.6	50.7
IFA-FSOD	30.1	52.3	58.7	62.4	65.4	27.7	37.9	38.0	42.5	48.6	21.9	44.5	49.8	55.5	58.0

方法	Novel Set 1					Novel Set 2					Novel Set 3
方法	1-shot	2-shot	3-shot	5-shot	10-shot	1-shot	2-shot	3-shot	5-shot	10-shot	1-shot	2-shot	3-shot	5-shot	10-shot
FSRW^［4］	14.8	15.5	26.7	33.9	47.2	15.7	15.3	22.7	30.1	40.5	21.3	25.6	28.4	42.8	45.9
MetaR-CNN^［6］	19.9	25.5	35.0	45.7	51.5	10.4	19.4	29.6	34.8	45.4	14.3	18.2	27.5	41.2	48.1
MetaDet^［12］	18.9	20.6	30.2	36.8	49.6	21.8	23.1	27.8	31.7	43.0	20.6	23.9	29.4	43.9	44.1
TFA^［5］	39.8	36.1	44.7	55.7	56.0	23.5	26.9	34.1	35.1	39.1	30.8	34.8	42.8	49.5	49.8
SRR-FSD^［25］	47.8	50.5	51.3	55.2	56.8	32.5	35.3	39.1	40.8	43.8	40.1	41.5	44.3	46.9	46.4
QA-FewDet^［19］	42.4	51.9	55.7	62.6	63.4	25.9	37.8	46.6	48.9	51.1	35.2	42.9	47.8	54.8	53.5
DA-FSOD^［33］	33.4	45.1	47.1	53.1	60.0	24.2	31.4	39.5	43.9	49.0	24.5	36.1	42.3	49.2	54.5
FSCE^［24］	32.9	44.0	46.8	52.9	59.7	23.7	30.6	38.4	38.4	48.5	22.6	33.4	39.5	47.3	54.0
G-FSD^［21］	42.4	45.8	45.9	53.7	56.1	21.7	27.8	35.2	37.0	40.3	30.2	37.6	43.0	49.7	50.1
FSOD-UP^［22］	43.8	47.8	50.3	55.4	61.7	31.2	30.5	41.2	42.2	48.3	35.5	39.7	43.9	50.6	53.5
DCNet^［8］	33.9	37.4	43.7	51.1	59.6	23.2	24.8	30.6	36.7	46.6	32.3	34.9	39.7	42.6	50.7
IFA-FSOD	30.1	52.3	58.7	62.4	65.4	27.7	37.9	38.0	42.5	48.6	21.9	44.5	49.8	55.5	58.0

方法	10-shot			30-shot
方法	mAP	mAP50	mAP75	mAP	mAP50	mAP75
TFAw/fc^［5］	10.0	19.2	9.2	13.4	24.7	13.2
TFAw/cos^［5］	10.0	19.1	9.3	13.7	24.9	13.4
FSRW^［4］	5.6	12.3	4.6	9.1	19.0	7.6
MetaDet^［12］	7.1	14.6	6.1	11.3	21.7	8.1
Meta R-CNN^［6］	8.7	19.1	6.6	12.4	25.3	10.8
MPSR^［34］	9.8	17.9	9.7	14.1	25.4	14.2
FSCE^［24］	11.9	—	10.5	15.3	—	14.2
FsDetView^［11］	12.5	27.3	9.8	14.7	30.6	12.2
SRR-FSD^［25］	11.3	23.0	9.8	14.7	29.2	13.5
IFA-FSOD	12.7	24.9	10.6	15.5	31.0	15.1

方法	10-shot			30-shot
方法	mAP	mAP50	mAP75	mAP	mAP50	mAP75
TFAw/fc^［5］	10.0	19.2	9.2	13.4	24.7	13.2
TFAw/cos^［5］	10.0	19.1	9.3	13.7	24.9	13.4
FSRW^［4］	5.6	12.3	4.6	9.1	19.0	7.6
MetaDet^［12］	7.1	14.6	6.1	11.3	21.7	8.1
Meta R-CNN^［6］	8.7	19.1	6.6	12.4	25.3	10.8
MPSR^［34］	9.8	17.9	9.7	14.1	25.4	14.2
FSCE^［24］	11.9	—	10.5	15.3	—	14.2
FsDetView^［11］	12.5	27.3	9.8	14.7	30.6	12.2
SRR-FSD^［25］	11.3	23.0	9.8	14.7	29.2	13.5
IFA-FSOD	12.7	24.9	10.6	15.5	31.0	15.1

方法		不同小样本条件下推理的nAP50
Metric RPN	FAM	1-shot	2-shot	3-shot	5-shot	10-shot
×	×	29.1	48.5	53.0	56.2	60.8
×	√	29.4	50.9	57.9	60.7	63.6
√	×	29.9	51.5	58.5	59.8	64.1
√	√	30.1	52.3	58.7	62.4	65.4

基于改进区域提议网络和特征聚合小样本目标检测方法

Few-shot object detection method based on improved region proposal network and feature aggregation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 34

相关文章 11

编辑推荐

Metrics

[1]	李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444.
[2]	张鹏飞, 韩李涛, 冯恒健, 李洪梅. 基于注意力机制和全局特征优化的点云语义分割[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1086-1092.
[3]	黄学雨, 贺怀宇, 林慧敏, 陈金水. 基于特征聚合的铜合金金相图分类识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2593-2601.
[4]	赵欣, 祝倩倩, 赵聪, 吴佳玲. 基于多尺度和跨空间融合的超声乳腺结节分割[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3599-3606.
[5]	林润超, 黄荣, 董爱华. 基于注意力机制和元特征二次重加权的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3025-3032.
[6]	何韩森, 孙国梓. 基于特征聚合的假新闻内容检测模型[J]. 计算机应用, 2020, 40(8): 2189-2193.
[7]	郭明祥, 宋全军, 徐湛楠, 董俊, 谢成军. 基于三维残差稠密网络的人体行为识别算法[J]. 计算机应用, 2019, 39(12): 3482-3489.
[8]	陈宏宇, 邓德祥, 颜佳, 范赐恩. 基于显著性语义区域加权的图像检索算法[J]. 计算机应用, 2019, 39(1): 136-142.
[9]	邹承明, 罗莹, 徐晓龙. 基于多特征组合的细粒度图像分类方法[J]. 计算机应用, 2018, 38(7): 1853-1856.
[10]	郭川磊, 何嘉. 基于转置卷积操作改进的单阶段多边框目标检测方法[J]. 计算机应用, 2018, 38(10): 2833-2838.
[11]	胡杨, 戴丹, 刘骊, 冯旭鹏, 刘利军, 黄青松. 基于情感角色模型的文本情感分类方法[J]. 计算机应用, 2015, 35(5): 1310-1313.