基于注意力机制和元特征二次重加权的小样本目标检测

doi:10.11772/j.issn.1001-9081.2021091571

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3025-3032.DOI: 10.11772/j.issn.1001-9081.2021091571

• 人工智能 • 上一篇

基于注意力机制和元特征二次重加权的小样本目标检测

林润超, 黄荣, 董爱华

东华大学信息科学与技术学院，上海 201620

收稿日期:2021-09-06 修回日期:2022-01-10 接受日期:2022-01-17 发布日期:2022-04-15 出版日期:2022-10-10
通讯作者: 董爱华
作者简介:第一联系人：林润超（1996—），男，四川宜宾人，硕士研究生，主要研究方向：深度学习、图像处理
黄荣（1985—），男，浙江绍兴人，讲师，博士，主要研究方向：深度学习、图像理解
董爱华（1970—），女，上海嘉定人，副教授，博士，主要研究方向：纺织服装人工智能。dongaihua@dhu.edu.cn
基金资助:
国家重点研发计划项目(2019YFC1521300)

Few-shot object detection based on attention mechanism and secondary reweighting of meta-features

Runchao LIN, Rong HUANG, Aihua DONG

College of Information Science and Technology，Donghua University，Shanghai 201620，China

Received:2021-09-06 Revised:2022-01-10 Accepted:2022-01-17 Online:2022-04-15 Published:2022-10-10
Contact: Aihua DONG
About author:LIN Runchao， born in 1996， M. S. candidate. His research interests include deep learning， image processing.
HUANG Rong， born in 1985， Ph. D. ， lecturer. His research interests include deep learning， image understanding.
DONG Aihua， born in 1970， Ph. D. ， associate professor. Her research interests include artificial intelligence for textile and apparel.
Supported by:
National Key Research and Development Program of China(2019YFC1521300)

摘要/Abstract

摘要：

在基于迁移学习的小样本目标检测任务中，由于缺乏关注图像中待检测目标的注意力机制，所以现有模型对于待检测目标周边背景区域的抑制能力不强，且在迁移学习过程中通常需要对元特征进行微调来实现跨域共享，这将引起元特征偏移，从而导致模型对大样本图像检测能力的下降。针对上述问题，基于注意力机制和元特征二次重加权机制，提出改进的元特征迁移模型Up-YOLOv3。首先，在原始元特征迁移模型Base-YOLOv2中引入基于卷积块注意力模块（CBAM）的注意力机制，使特征提取网络聚焦于图像中的目标区域并关注图像目标类别的细节特征，从而提升模型对小样本图像目标的检测性能；其次，引入基于压缩?激励（SE）的元特征二次重加权模块（SE-SMFR）对大样本图像的元特征进行二次重加权，以获取二次重加权元特征，使模型在提升小样本目标检测性能的同时也能减小大样本图像元特征信息的权重偏移。实验结果表明，在PASCAL VOC2007/2012数据集上，相较于Base-YOLOv2，Up-YOLOv3针对小样本图像检测的平均准确率均值（mAP）提升了2.3~9.1个百分点；相较于原始的基于YOLOv3元特征迁移模型Base-YOLOv3，Up-YOLOv3针对大样本图像的mAP提升了1.8~2.4个百分点。可见，改进后模型对不同类别的大样本图像和小样本图像均具有良好的泛化能力和鲁棒性。

关键词: 小样本目标检测, 元特征迁移, 特征重加权, 注意力机制, 二次重加权

Abstract:

In the few-shot object detection task based on transfer learning， due to the lack of attention mechanism to focus on the object to be detected in the image， the ability of the existing models to suppress the surrounding background area of the object is not strong， and in the process of transfer learning， it is usually necessary to fine-tune the meta-features to achieve cross-domain sharing， which will cause meta-feature shift， and lead to the decline of the model’s ability to detect large-sample images. To solve the above problems， an improved meta-feature transfer model Up-YOLOv3 based on the attention mechanism and the meta-feature secondary reweighting mechanism was proposed. Firstly， the Convolution Block Attention Module （CBAM）-based attention mechanism was introduced in the original meta-feature transfer model Base-YOLOv2， so that the feature extraction network was able to focus on the object area in the image and pay attention to the detailed features of the image object class， thereby improving the model’s detection performance for few-shot image objects. Then， the Squeeze and Excitation-Secondary Meta-Feature Reweighting （SE-SMFR） module was introduced to reweight the meta-features of the large-sample image for the second time in order to obtain the secondary reweighted meta-features， so that the model was not only able to improve the performance of few-shot object detection， but also able to reduce the weight shift of the meta-feature information of the large-sample image. Experimental results on PASCAL VOC2007/2012 dataset show that， compared with Base-YOLOv2， Up-YOLOv3 has the detection mean Average Precision （mAP） for few-shot object images increased by 2.3 to 9.1 percentage points； compared with the original meta-feature transfer model based on YOLOv3 Base-YOLOv3， mAP for large-sample object images increased by 1.8 to 2.4 percentage points. It can be seen that the improved model has good generalization ability and robustness for both large-sample images and few-shot images of different classes.

Key words: few-shot object detection, meta-feature transfer, feature reweighting, attention mechanism, secondary reweighting

中图分类号:

TP391.41

林润超, 黄荣, 董爱华. 基于注意力机制和元特征二次重加权的小样本目标检测[J]. 计算机应用, 2022, 42(10): 3025-3032.

Runchao LIN, Rong HUANG, Aihua DONG. Few-shot object detection based on attention mechanism and secondary reweighting of meta-features[J]. Journal of Computer Applications, 2022, 42(10): 3025-3032.

图/表 11

参考文献 26

1	SHEN D G， WU G R， SUK H I. Deep learning in medical image analysis［J］. Annual Review of Biomedical Engineering， 2017， 19： 221-248. 10.1146/annurev-bioeng-071516-044442
2	黄元涛. 基于深度学习的藏羚羊检测与跟踪［D］. 西安：西安电子科技大学， 2020： 3-69.
	HUANG Y T. Detection and tracking of Tibetan Antelope based on deep learning［D］. Xi’an： Xidian University， 2020： 3-69.
3	BANSAL A， SIKKA K， SHARMA G， et al. Zero-shot object detection［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11205. Cham： Springer， 2018： 397-414.
4	RAHMAN， S， KHAN， S， PORIKLI F. Zero-shot object detection： learning to simultaneously recognize and localize novel concepts［C］// Proceedings of the 2018 Asian Conference on Computer Vision， LNCS 11361. Cham： Springer， 2019： 547-563.
5	ZHU P K， WANG H X， SALIGRAMA V. Zero-shot detection［J］. IEEE Transactions on Circuits and System for Video Technology， 2020， 30（4）： 998-1010. 10.1109/tcsvt.2019.2899569
6	潘兴甲，张旭龙，董未名，等. 小样本目标检测的研究现状［J］. 南京信息工程大学学报（自然科学版）， 2019， 11（6）： 698-705.
	PAN X J， ZHANG X L， DONG W M， et al. A survey of few-shot object detection［J］. Journal of Nanjing University of Information Science and Technology （Natural Science Edition）， 2019， 11（6）： 698-705.
7	VINYALS O， BLUNDELL C， LILLICRAP T， et al. Matching networks for one shot learning［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 3637-3645.
8	FINN C， ABBEEL P， LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1126-1135. 10.1109/icra.2016.7487173
9	SNELL J， SWERSKY K， ZEMEL R S. Prototypical networks for few-shot learning ［C］// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 178-181.
10	MISRA I， SHRIVASTAVA A， HEBERT M. Watch and learn： semi-supervised learning of object detectors from videos［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3593-3602. 10.1109/cvpr.2015.7298982
11	XING C， ROSTAMZADEH N， ORESHKIN B O. Adaptive cross-modal few-shot learning［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-02-27］..
12	NI J， ZHANG S H， XIE H Y. Dual adversarial semantics-consistent network for generalized zero-shot learning［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-02-27］..
13	REN M Y， LIAO R J， FETAYA E， et al. Incremental few-shot learning with attention attractor networks［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-02-27］..
14	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
15	FAN Q， ZHUO W， TANG C K， et al. Few-shot object detection with attention-RPN and multi-relation detector［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 4012-4021. 10.1109/cvpr42600.2020.00407
16	徐诚极，王晓峰，杨亚东. Attention-YOLO：引入注意力机制的YOLO检测算法［J］. 计算机工程与应用， 2019， 55（6）： 13-23， 125.
	XU C J， WANG X F， YANG Y D. Attention-YOLO： YOLO detection algorithm that introduces attention mechanism［J］. Computer Engineering and Applications， 2019， 55（6）： 13-23， 125.
17	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-03-15］..
18	CHEN H， WANG Y L， WANG G Y， et al. LSTD： a low-shot transfer detector for object detection［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 2836-2843. 10.1609/aaai.v32i1.11716
19	KANG B Y， LIU Z， WANG X， et al. Few-shot object detection via feature reweighting［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8419-8428. 10.1109/iccv.2019.00851
20	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
21	WOO S， PRAK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
22	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
23	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74
24	EVERINGHAM M， VAN GOOL L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge［J］. International Journal of Computer Vision， 2010， 88（2）： 303-338. 10.1007/s11263-009-0275-4
25	EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The PASCAL Visual Object Classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）： 98-136. 10.1007/s11263-014-0733-5
26	MAATEN L V D， HINTON G. Visualizing data using t-SNE［J］. Journal of Machine Learning Research， 2008， 9： 2579-2605.

模型	新类组合1					新类组合2					新类组合3
模型	k=1	k=2	k=3	k=5	k=10	k=1	k=2	k=3	k=5	k=10	k=1	k=2	k=3	k=5	k=10
LSTD^［18］	10.8	12.2	23.9	29.5	33.4	10.2	14.5	20.5	26.6	32.4	10.1	14.4	19.7	25.8	34.7
Base-YOLOv2^［19］	12.3	17.8	25.8	31.3	35.1	10.2	16.3	22.4	32.1	36.5	12.7	20.2	20.5	31.0	36.2
Base-YOLOv3	15.2	20.7	27.5	33.1	38.2	12.4	19.8	24.1	35.9	39.9	14.2	25.9	24.1	35.5	40.1
Up-YOLOv2	12.1	18.2	25.2	32.4	36.9	11.5	18.1	23.5	33.2	37.2	11.4	21.4	20.6	33.4	38.9
Up-YOLOv3（本文模型）	15.5	21.2	28.1	36.2	40.3	13.9	22.6	28.8	35.6	41.9	15.1	27.8	29.6	38.4	41.7

模型	新类组合1					新类组合2					新类组合3
模型	k=1	k=2	k=3	k=5	k=10	k=1	k=2	k=3	k=5	k=10	k=1	k=2	k=3	k=5	k=10
LSTD^［18］	10.8	12.2	23.9	29.5	33.4	10.2	14.5	20.5	26.6	32.4	10.1	14.4	19.7	25.8	34.7
Base-YOLOv2^［19］	12.3	17.8	25.8	31.3	35.1	10.2	16.3	22.4	32.1	36.5	12.7	20.2	20.5	31.0	36.2
Base-YOLOv3	15.2	20.7	27.5	33.1	38.2	12.4	19.8	24.1	35.9	39.9	14.2	25.9	24.1	35.5	40.1
Up-YOLOv2	12.1	18.2	25.2	32.4	36.9	11.5	18.1	23.5	33.2	37.2	11.4	21.4	20.6	33.4	38.9
Up-YOLOv3（本文模型）	15.5	21.2	28.1	36.2	40.3	13.9	22.6	28.8	35.6	41.9	15.1	27.8	29.6	38.4	41.7

模型	基类组合1	基类组合2	基类组合3
YOLOv3（上界）	74.5	76.9	76.1
LSTD^［18］	63.5	63.2	63.1
Base-YOLOv2^［19］	66.2	67.2	66.9
Base-YOLOv3	72.8	73.5	73.7
Up-YOLOv2	68.3	68.6	68.2
Up-YOLOv3（本文模型）	74.8	75.9	75.5

模型	基类组合1	基类组合2	基类组合3
YOLOv3（上界）	74.5	76.9	76.1
LSTD^［18］	63.5	63.2	63.1
Base-YOLOv2^［19］	66.2	67.2	66.9
Base-YOLOv3	72.8	73.5	73.7
Up-YOLOv2	68.3	68.6	68.2
Up-YOLOv3（本文模型）	74.8	75.9	75.5

组号	CBAM	SE-SMFR	mAP/%
组号	CBAM	SE-SMFR	大样本图像	小样本图像
1	×	×	73.5	38.9
2	√	×	72.8	40.7
3	×	√	75.6	39.2
4	√	√	75.9	41.7

基于注意力机制和元特征二次重加权的小样本目标检测

Few-shot object detection based on attention mechanism and secondary reweighting of meta-features

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 26

相关文章 15

编辑推荐

Metrics

模型	参数量/10⁶	浮点运算量/GFLOPs	收敛时间/h
Base-YOLOv2	175	29.132	9.1
Up-YOLOv2	178	29.346	9.7
Base-YOLOv3	243	37.152	10.6
Up-YOLOv3	245	37.254	11.2

[1]	侯旭东, 滕飞, 张艺. 基于深度自编码的医疗命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2686-2692.
[2]	刘月峰, 张小燕, 郭威, 边浩东, 何滢婕. 基于优化混合模型的航空发动机剩余寿命预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2960-2968.
[3]	文凯, 唐伟伟, 熊俊臣. 基于注意力机制和有效分解卷积的实时分割算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2659-2666.
[4]	李姚舜, 刘黎志. 嵌入注意力机制的轻量级钢筋检测网络[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2900-2908.
[5]	魏海云, 郑茜颖, 俞金玲. 基于多尺度网络的运动模糊图像复原算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2838-2844.
[6]	张文涛, 王园宇, 李赛泽. 基于条件对抗网络的单幅霾图像深度估计模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2865-2875.
[7]	衡红军, 徐天宝. 基于多尺度卷积和门控机制的注意力情感分析模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2674-2679.
[8]	李坤, 侯庆. 基于注意力机制的轻量型人体姿态估计[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2407-2414.
[9]	徐成霞, 阎庆, 李腾, 苗开超. 基于联合注意力机制的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2578-2585.
[10]	吴明晖, 张广洁, 金苍宏. 基于多模态信息融合的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2326-2332.
[11]	吕振虎, 许新征, 张芳艳. 基于挤压激励的轻量化注意力机制模块[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2353-2360.
[12]	张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2423-2431.
[13]	张新宇, 丁胜, 杨治佩. 基于改进注意力机制的交通标志检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2378-2385.
[14]	玄英律, 万源, 陈嘉慧. 基于多尺度卷积和注意力机制的LSTM时间序列分类[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2343-2352.
[15]	刘博, 卿粼波, 王正勇, 刘美, 姜雪. 基于分块注意力机制和交互位置关系的群组活动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2052-2057.