基于孪生网络的小样本目标检测算法

doi:10.11772/j.issn.1001-9081.2022121865

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2325-2329.DOI: 10.11772/j.issn.1001-9081.2022121865

• 第十九届CCF中国信息系统及应用大会 • 上一篇

基于孪生网络的小样本目标检测算法

姜钧舰¹, 刘达维¹, 刘逸凡¹, 任酉贵¹^,², 赵志滨¹()

^1.东北大学计算机科学与工程学院，沈阳 110169
^2.辽宁省自然资源事务服务中心，沈阳 110001

收稿日期:2022-12-15 修回日期:2023-02-02 接受日期:2023-02-08 发布日期:2023-04-21 出版日期:2023-08-10
通讯作者: 赵志滨
作者简介:姜钧舰（1998—），男，辽宁丹东人，硕士研究生，CCF会员，主要研究方向：机器学习、计算机视觉
刘达维（1998—），男，辽宁沈阳人，硕士研究生，CCF会员，主要研究方向：机器学习、计算机视觉
刘逸凡（1998—），男，河北保定人，硕士研究生，CCF会员，主要研究方向：机器学习、计算机视觉
任酉贵（1981—），男，辽宁沈阳人，博士研究生，主要研究方向：遥感图像处理、计算机视觉；
基金资助:
国家自然科学基金资助项目(U1811261)

Few-shot object detection algorithm based on Siamese network

Junjian JIANG¹, Dawei LIU¹, Yifan LIU¹, Yougui REN¹^,², Zhibin ZHAO¹()

^1.School of Computer Science and Engineering，Northeastern University，Shenyang Liaoning 110169，China
^2.Service Center of Natural Resource Affairs of Liaoning Province，Shenyang Liaoning 110001，China

Received:2022-12-15 Revised:2023-02-02 Accepted:2023-02-08 Online:2023-04-21 Published:2023-08-10
Contact: Zhibin ZHAO
About author:JIANG Junjian， born in 1998， M. S. candidate. His research interests include machine learning， computer vision.
LIU Dawei， born in 1998， M. S. candidate. His research interests include machine learning， computer vision.
LIU Yifan， born in 1998， M. S. candidate. His research interests include machine learning， computer vision.
REN Yougui， born in 1981， Ph. D. candidate. His research interests include remote sensing image processing， computer vision.
Supported by:
National Natural Science Foundation of China(U1811261)

摘要/Abstract

摘要：

基于深度学习的目标检测算法如YOLO（You Only Look Once）和Faster R-CNN（Faster Region-Convolutional Neural Network）需要大量训练数据以保证模型的精度，而在很多场景下获取数据以及标注数据的成本较高；并且由于缺少海量的训练数据，导致检测的范围受限。针对以上问题，提出了一种基于孪生网络的小样本目标检测算法（SiamDet），旨在使用少量标注图像训练具有一定泛化能力的目标检测模型。首先，提出了基于深度可分离卷积的孪生网络，并使用深度可分离卷积设计了特征提取网络ResNet-DW，从而解决了样本不充足带来的过拟合问题；其次，基于孪生网络，提出了目标检测算法SiamDet，并在ResNet-DW的基础上，引入区域建议网络（RPN）来定位感兴趣目标；然后，引入二值交叉熵损失进行训练，并使用对比训练策略，从而增加了类别之间的区分度。实验结果表明，SiamDet在小样本条件下具有良好的目标检测能力，且相较于次优的算法DeFRCN（Decoupled Faster R-CNN），SiamDet在MS-COCO数据集20-way 2-shot和PASCAL VOC数据集5-way 5-shot上的AP₅₀分别增加了4.1%和2.6%。

关键词: 目标检测, 小样本学习, 孪生网络, 深度可分离卷积, 对比训练

Abstract:

Deep learning based algorithms such as YOLO （You Only Look Once） and Faster Region-Convolutional Neural Network （Faster R-CNN） require a huge amount of training data to ensure the precision of the model， and it is difficult to obtain data and the cost of labeling data is high in many scenarios. And due to the lack of massive training data， the detection range is limited. Aiming at the above problems， a few-shot object Detection algorithm based on Siamese Network was proposed， namely SiamDet， with the purpose of training an object detection model with certain generalization ability by using a few annotated images. Firstly， a Siamese network based on depthwise separable convolution was proposed， and a feature extraction network ResNet-DW was designed to solve the overfitting problem caused by insufficient samples. Secondly， an object detection algorithm SiamDet was proposed based on Siamese network， and based on ResNet-DW， Region Proposal Network （RPN） was introduced to locate the interested objects. Thirdly， binary cross entropy loss was introduced for training， and contrast training strategy was used to increase the distinction among categories. Experimental results show that SiamDet has good object detection ability for few-shot objects， and SiamDet improves AP₅₀ by 4.1% on MS-COCO 20-way 2-shot and 2.6% on PASCAL VOC 5-way 5-shot compared with the suboptimal algorithm DeFRCN （Decoupled Faster R-CNN）.

Key words: object detection, few-shot learning, Siamese network, depthwise separable convolution, contrast training

中图分类号:

TP391

姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 计算机应用, 2023, 43(8): 2325-2329.

Junjian JIANG, Dawei LIU, Yifan LIU, Yougui REN, Zhibin ZHAO. Few-shot object detection algorithm based on Siamese network[J]. Journal of Computer Applications, 2023, 43(8): 2325-2329.

图/表 8

图1 SiamDet总体流程

Fig. 1 Overall flow of SiamDet

图2 残差模块

Fig. 2 Residual modules

表1 网络结构参数

Tab. 1 Network structure parameters

名称	ResNet	ResNet-DW
conv1	7×7Conv，N=64，stride=2	7×7 Conv， N=64， stride=2
conv2_x	3×3maxpool， stride=2 $1 × 1 C o n v, N = 64 3 × 3 C o n v, N = 64 1 × 1 C o n v, N = 256 × 3$	3×3maxpool， stride=2 $3 × 3 D W, N = 64 1 × 1 P W, N = 128 × 6$
conv3_x	$1 × 1 C o n v, N = 128 3 × 3 C o n v, N = 128 1 × 1 C o n v, N = 512 × 4$	$3 × 3 D W, N = 128 1 × 1 P W, N = 256 × 8$
conv4_x	$1 × 1 C o n v, N = 256 3 × 3 C o n v, N = 256 1 × 1 C o n v, N = 1 024 × 23$	$3 × 3 D W, N = 256 1 × 1 P W, N = 512 × 46$
conv5_x	$1 × 1 C o n v, N = 512 3 × 3 C o n v, N = 512 1 × 1 C o n v, N = 2 048 × 3$	$3 × 3 D W, N = 512 1 × 1 P W, N = 512 × 6$

表1 网络结构参数

Tab. 1 Network structure parameters

名称	ResNet	ResNet-DW
conv1	7×7Conv，N=64，stride=2	7×7 Conv， N=64， stride=2
conv2_x	3×3maxpool， stride=2 $1 × 1 C o n v, N = 64 3 × 3 C o n v, N = 64 1 × 1 C o n v, N = 256 × 3$	3×3maxpool， stride=2 $3 × 3 D W, N = 64 1 × 1 P W, N = 128 × 6$
conv3_x	$1 × 1 C o n v, N = 128 3 × 3 C o n v, N = 128 1 × 1 C o n v, N = 512 × 4$	$3 × 3 D W, N = 128 1 × 1 P W, N = 256 × 8$
conv4_x	$1 × 1 C o n v, N = 256 3 × 3 C o n v, N = 256 1 × 1 C o n v, N = 1 024 × 23$	$3 × 3 D W, N = 256 1 × 1 P W, N = 512 × 46$
conv5_x	$1 × 1 C o n v, N = 512 3 × 3 C o n v, N = 512 1 × 1 C o n v, N = 2 048 × 3$	$3 × 3 D W, N = 512 1 × 1 P W, N = 512 × 6$

图3 对比训练过程

Fig. 3 Process of contrast training

图4 数据集样例

Fig. 4 Samples of datasets

表2 MS-COCO数据集上的实验结果 (%)

Tab. 2 Experimental results on MS-COCO dataset

K	算法	mAP	AP₅₀	AP₇₅
2	TFA	5.4	15.1	4.6
	MSPR	6.7	18.0	6.2
	DeFRCN	10.8	21.9	8.8
	SiamDet	11.0	22.8	8.6
5	TFA	7.7	19.3	6.8
	MSPR	8.7	20.0	8.0
	DeFRCN	13.7	27.8	11.1
	SiamDet	13.7	28.6	11.0

表3 PASCAL VOC数据集上的实验结果 (%)

Tab. 3 Experimental results on PASCAL VOC dataset

K	算法	Novel Set 1			Novel Set 2			Novel Set 3
K	算法	mAP	AP₅₀	AP₇₅	mAP	AP₅₀	AP₇₅	mAP	AP₅₀	AP₇₅
2	TFA	18.5	36.1	14.2	12.0	26.9	8.6	16.1	34.8	10.5
	MSPR	19.6	40.5	15.8	12.5	30.3	8.9	16.3	39.5	11.0
	DeFRCN	21.7	44.6	18.2	14.8	32.7	10.3	17.8	41.1	13.1
	SiamDet	20.6	44.9	18.2	14.9	33.0	10.5	18.0	41.8	13.4
5	TFA	22.4	46.3	16.1	17.6	38.9	11.8	20.1	42.6	14.2
	MSPR	23.7	50.6	19.8	18.8	41.5	12.1	20.5	44.8	15.6
	DeFRCN	26.8	54.5	21.4	20.1	44.6	14.4	22.4	47.3	17.1
	SiamDet	26.9	55.9	21.5	21.5	45.0	14.4	22.6	47.5	17.6

表4 消融实验结果

Tab. 4 Ablation experiment results

ResNet-DW	ResNet	GDL	Attention-RPN	mAP/%
√				8.6
√		√		11.0
√			√	10.2
	√	√	√	12.5
√		√	√	13.7

参考文献 27

1	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
2	LOWE D G. Distinctive image features from scale-invariant keypoints［J］. International Journal of Computer Vision， 2004， 60（2）： 91-110. 10.1023/b:visi.0000029664.99615.94
3	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015：91-99.
4	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2022-04-13］.. 10.1109/cvpr.2017.690
5	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
6	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
7	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
8	HSIEH T I， LO Y C， CHEN H T， et al. One-shot object detection with co-attention and co-excitation［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 2725-2734.
9	KANG B Y， LIU Z， WANG X， et al. Few-shot object detection via feature reweighting［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8419-8428. 10.1109/iccv.2019.00851
10	FAN Q， ZHUO W， TANG C K， et al. Few-shot object detection with attention-RPN and multi-relation detector［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 4012-4021. 10.1109/cvpr42600.2020.00407
11	QIAO L M， ZHAO Y X， LI Z Y， et al. DeFRCN： decoupled Faster R-CNN for few-shot object detection［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 8661-8670. 10.1109/iccv48922.2021.00856
12	CUCCHIARA R， GRANA C， PICCARDI M， et al. Improving shadow suppression in moving object detection with HSV color information［C］// Proceedings of the 2001 IEEE Intelligent Transportation Systems. Piscataway： IEEE， 2001： 334-339.
13	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
14	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
15	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-04-23）［2022-05-06］..
16	CHEN H， WANG Y L， WANG G Y， et al. LSTD： a low-shot transfer detector for object detection［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 2836-2843. 10.1609/aaai.v32i1.11716
17	WANG T， ZHANG X P， YUAN L， et al. Few-shot adaptive Faster R-CNN［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7166-7175. 10.1109/cvpr.2019.00734
18	WANG X， HUANG T E， DARRELL T， et al. Frustratingly simple few-shot object detection［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 9919-9928.
19	ZHANG G J， CUI K W， WU R L， et al. PNPDet： efficient few-shot detection without forgetting via plug-and-play sub-networks［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 3822-3831. 10.1109/wacv48630.2021.00387
20	ZHANG T F， ZHANG Y， SUN X， et al. Comparison network for one-shot conditional object detection［EB/OL］. ［2022-02-28］.. 10.1016/j.neucom.2020.04.092
21	PÉREZ-RÚA J M， ZHU X T， HOSPEDALES T M， et al. Incremental few-shot object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13843-13852. 10.1109/cvpr42600.2020.01386
22	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. ［2022-08-04］.. 10.48550/arXiv.1704.04861
23	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
24	GLOROT X， BORDES A， BENGIO Y. Deep sparse rectifier neural networks［C］//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2011： 315-323.
25	HADSELL R， CHOPRA S， LeCUN Y. Dimensionality reduction by learning an invariant mapping［C］// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2006： 1735-1742. 10.1109/cvpr.2006.9
26	SCHROFF F， KALENICHENKO D， PHILBIN J. FaceNet： a unified embedding for face recognition and clustering［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 815-823. 10.1109/cvpr.2015.7298682
27	WU J X， LIU S T， HUANG D， et al. Multi-scale positive sample refinement for few-shot object detection［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12361. Cham： Springer， 2020： 456-472.

[1]	姬张建, 张明, 王子龙. 基于改进VarifocalNet的高精度目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2147-2154.
[2]	李忠雨, 孙浩东, 李娇. 轻量化篮球裁判手势识别算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2173-2181.
[3]	詹春兰, 王安志, 王明辉. 基于通道注意力和边缘融合的伪装目标分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2166-2172.
[4]	周静, 胡怡宇, 胡成玉, 王天江. 基于点云补全和多分辨Transformer的弱感知目标检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2155-2165.
[5]	郭奕裕, 周箩鱼, 刘新瑜, 李尧. 改进注意力机制的电梯场景下危险品检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2295-2302.
[6]	吕宗喆, 徐慧, 杨骁, 王勇, 王唯鉴. 面向小目标的YOLOv5安全帽检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1943-1949.
[7]	方可, 刘蓉, 魏驰宇, 张心月, 刘杨. 复杂场景下的行人跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1811-1817.
[8]	王强, 黄小明, 佟强, 刘秀磊. 基于边界框标注的弱监督显著性目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1910-1918.
[9]	赵元龙, 单玉刚, 袁杰, 赵康迪. 基于实例分割与毕达哥拉斯模糊决策的目标跟踪[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1930-1937.
[10]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.
[11]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[12]	王梦亭, 杨文忠, 武雍智. 基于孪生网络的单目标跟踪算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 661-673.
[13]	顾勇翔, 蓝鑫, 伏博毅, 秦小林. 基于几何适应与全局感知的遥感图像目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 916-922.
[14]	杨有, 张汝荟, 许鹏程, 康慷, 翟浩. 面向民国档案印章分割的改进U-Net[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 943-948.
[15]	李海丰, 张凡, 朴敏楠, 王怀超, 李南莎, 桂仲成. 基于通道和空间注意力的机场道面地下目标自动检测[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 930-935.

基于孪生网络的小样本目标检测算法

Few-shot object detection algorithm based on Siamese network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 27

相关文章 15

编辑推荐

Metrics