Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

doi:10.11772/j.issn.1001-9081.2022020270

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1062-1070.DOI: 10.11772/j.issn.1001-9081.2022020270

• Artificial intelligence • Previous Articles

Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

Rong GAO¹^,²(), Jiawei SHEN¹, Xiongkai SHAO¹, Xinyun WU¹

^1.School of Computer Science，Hubei University of Technology，Wuhan Hubei 430068，China
^2.State Key Laboratory for Novel Software Technology （Nanjing University），Nanjing Jiangsu 210023，China

Received:2022-03-09 Revised:2022-05-20 Accepted:2022-05-20 Online:2022-08-16 Published:2023-04-10
Contact: Rong GAO
About author:SHEN Jiawei， born in 1998， M. S. candidate. His research interests include object detection， instance segmentation.
SHAO Xiongkai， born in 1963， Ph. D.， professor. His research interests include machine learning， image processing.
WU Xinyun， born in 1987， Ph. D.， associate professor. His research interests include algorithm for solving combinatorial optimization problems.
Supported by:
National Natural Science Foundation of China(61902116);Open Project of State Key Laboratory for Novel Software Technology in Nanjing University(KFKT2021B12);Hubei Provincial High-level Talent Fund(GCRC2020011);Doctoral Research Start-Up Fund of Hubei University of Technology(BSQD2019026)

基于Fastformer和自监督对比学习的实例分割算法

高榕¹^,²(), 沈加伟¹, 邵雄凯¹, 吴歆韵¹

^1.湖北工业大学计算机学院，武汉 430068
^2.计算机软件新技术国家重点实验室（南京大学），南京 210023

通讯作者: 高榕
作者简介:沈加伟（1998—），男，湖北黄冈人，硕士研究生，主要研究方向：目标检测、实例分割；
邵雄凯（1963—），男，湖北黄冈人，教授，博士，CCF会员，主要研究方向：机器学习、图像处理；
吴歆韵（1987—），男，湖北宜昌人，副教授，博士，主要研究方向：组合优化问题求解算法。
基金资助:
国家自然科学基金资助项目(61902116);南京大学计算机软件新技术国家重点实验室开放课题(KFKT2021B12);湖北省高层次人才基金资助项目(GCRC2020011);湖北工业大学博士科研启动基金资助项目(BSQD2019026)

Abstract

Abstract:

To address problems of low detection precision， coarse masks and weak generalization ability of the existing instance segmentation algorithms for occluded and blurred instances， an instance segmentation algorithm based on Fastformer and self-supervised contrastive learning was proposed. Firstly， in order to enhance the ability of algorithm to extract global information of feature maps， the Fastformer module based on additive attention was added after feature extraction network， and interrelationship between pixels in each layer of feature map was modeled deeply. Secondly， inspired by self-supervised learning， a self-supervised contrastive learning module was added to conduct self-supervised contrastive learning to instances in images to enhance the ability of algorithm to understand images， thereby improving segmentation results in environments with much noise interference. Experimental results show that the proposed algorithm has the mean Average Precision （mAP） improved by 3.1 and 2.5 percentage points respectively， compared to recently classical instance segmentation algorithm SOLOv2（Segmenting Objects by LOcations v2） on Cityscapes dataset and COCO2017 dataset. And a great balance is achieved between real-time performance and precision by the proposed algorithm， leading good robustness in segmentation instance of complex scenes.

Key words: instance segmentation, feature extraction, Fastformer, addictive attention, self-supervised contrastive learning

摘要：

针对现有的实例分割算法对有遮挡以及模糊实例检测精度低、掩码较粗糙以及泛化能力弱的问题，提出一种基于Fastformer和自监督对比学习的实例分割算法。首先，在特征提取网络之后加入基于加性注意力的Fastformer模块，并对每一层特征图中的像素点之间的相互关系进行深入建模，以提高算法对特征图全局信息的提取能力；其次，受自监督学习启发，加入自监督对比学习模块对图像中的实例进行自监督对比学习，以提高算法对图像的理解能力，从而改善在噪声干扰较多的环境下的分割效果。在Cityscapes和COCO2017数据集上的实验结果表明，相较于近期经典的实例分割算法SOLOv2（Segmenting Objects by LOcations v2），所提算法的平均精度均值（mAP）分别提高了3.1和2.5个百分点，并在实时性和精度之间达到较好的平衡，在比较复杂的场景实例分割中具有较好的鲁棒性。

关键词: 实例分割, 特征提取, Fastformer, 加性注意力, 自监督对比学习

CLC Number:

T391.4

Rong GAO, Jiawei SHEN, Xiongkai SHAO, Xinyun WU. Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning[J]. Journal of Computer Applications, 2023, 43(4): 1062-1070.

高榕, 沈加伟, 邵雄凯, 吴歆韵. 基于Fastformer和自监督对比学习的实例分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1062-1070.

Figures/Tables 13

Fig. 1 Instance segmentation results of SOLOv2 in different scenes

Fig. 2 Overall architecture of the proposed algorithm

Fig. 3 Input image and simulation result of pre-processed image

Fig. 4 Simulation results of feature maps by backbone network

Fig. 5 Fastformer module

Fig. 6 Simulation results of input and output feature maps of Fastformer

Fig. 7 Three branch modules of prediction network

Tab. 1 Comparison of experimental results on Cityscapes dataset

算法	速率/（frame·s^-1）	mAP/%	AP₅₀/%	不同类别的AP/%
算法	速率/（frame·s^-1）	mAP/%	AP₅₀/%	person	rider	car	bus
SGN	0.61	25.0	44.9	21.8	20.1	39.4	33.2
Mask R-CNN	0.72	25.7	45.2	20.6	23.2	40.5	32.7
MEInst	1.34	26.4	47.1	23.7	24.9	42.7	33.1
PANet	2.24	26.2	49.9	30.5	23.7	46.9	32.2
CondInst	0.84	31.8	57.1	36.8	30.4	54.8	36.3
Deep snake	4.25	32.4	56.9	37.0	31.9	56.8	38.6
BlendMask	4.67	31.7	58.4	37.2	27.0	56.0	40.5
SOLOv2	4.76	32.6	59.0	35.4	30.6	56.9	41.5
本文算法	4.49	35.7	62.3	36.9	32.1	60.3	44.7

Fig. 8 Instance segmentation results on Cityscapes dataset

Tab. 2 Comparison of experimental results on COCO2017 dataset

算法	速率/（frame·s^-1）	$m A P$ /%	AP/%
算法	速率/（frame·s^-1）	$m A P$ /%	小目标	中目标	大目标
SGN	10.2	35.7	19.8	38.7	47.2
Mask R-CNN	15.3	37.5	21.1	39.6	48.3
MEInst	15.0	33.5	19.3	35.7	42.1
PANet	14.6	32.7	20.1	36.8	44.5
CondInst	15.4	37.8	21.0	40.3	48.7
Deep snake	14.2	38.0	20.8	41.8	52.3
BlendMask	15.0	37.8	18.8	40.9	53.6
SOLOv2	13.5	38.2	17.6	41.2	55.4
本文算法	12.7	40.7	21.3	43.9	57.5

Tab. 2 Comparison of experimental results on COCO2017 dataset

算法	速率/（frame·s^-1）	$m A P$ /%	AP/%
算法	速率/（frame·s^-1）	$m A P$ /%	小目标	中目标	大目标
SGN	10.2	35.7	19.8	38.7	47.2
Mask R-CNN	15.3	37.5	21.1	39.6	48.3
MEInst	15.0	33.5	19.3	35.7	42.1
PANet	14.6	32.7	20.1	36.8	44.5
CondInst	15.4	37.8	21.0	40.3	48.7
Deep snake	14.2	38.0	20.8	41.8	52.3
BlendMask	15.0	37.8	18.8	40.9	53.6
SOLOv2	13.5	38.2	17.6	41.2	55.4
本文算法	12.7	40.7	21.3	43.9	57.5

Fig. 9 Instance segmentation results on COCO2017 dataset

Fig. 10 Segmentation effect display of instance segmentation results trained on COCO2017 dataset applying on Cityscapes dataset

Tab. 3 Experimental results of module analysis

算法	$m A P$ /%	$A P 50$ /%	速率/（frame·s^-1）
Baseline	32.6	59.0	4.76
Baseline+Fast	34.5	61.2	4.52
Baseline+Fast+Cont	35.7	62.3	4.49

Tab. 3 Experimental results of module analysis

算法	$m A P$ /%	$A P 50$ /%	速率/（frame·s^-1）
Baseline	32.6	59.0	4.76
Baseline+Fast	34.5	61.2	4.52
Baseline+Fast+Cont	35.7	62.3	4.49

References 34

1	WU D H， LV S C， JIANG M， et al. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments［J］. Computers and Electronics in Agriculture， 2020， 178： No.105742. 10.1016/j.compag.2020.105742
2	马佳良，陈斌，孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架［J］. 计算机应用， 2021， 41（9）：2712-2719. 10.11772/j.issn.1001-9081.2020111852
	MA J L， CHEN B， SUN X F. General object detection framework based on improved Faster R-CNN［J］. Journal of Computer Applications， 2021， 41（9）： 2712-2719. 10.11772/j.issn.1001-9081.2020111852
3	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
4	WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460. 10.1109/wacv.2018.00163
5	杨贞，彭小宝，朱强强，等. 基于Deeplab V3 Plus的自适应注意力机制图像分割算法［J］. 计算机应用， 2022， 42（1）：230-238. 10.11772/j.issn.1001-9081.2021010137
	YANG Z， PENG X B， ZHU Q Q， et al. Image segmentation algorithm with adaptive attention mechanism based on Deeplab V3 Plus［J］. Journal of Computer Applications， 2022， 42（1）： 230-238. 10.11772/j.issn.1001-9081.2021010137
6	WANG X L， KONG T， SHEN C H， et al. SOLO： segmenting objects by locations［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12363. Cham： Springer， 2020： 649-665.
7	WANG X L， ZHANG R F， KONG T， et al. SOLOv2： dynamic and fast instance segmentation［C/OL］// Proceedings of the 34th Conference on Neural Information Processing System. ［2022-01-23］..
8	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223. 10.1109/cvpr.2016.350
9	WU C H， WU F Z， QI T， et al. Fastformer： additive attention can be all you need［EB/OL］. （2021-09-05）［2022-01-23］..
10	ZHOU X Y， ZHUO J C， KRÄHENBÜHL P. Bottom-up object detection by grouping extreme and center points［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 850-859. 10.1109/cvpr.2019.00094
11	XIE E Z， SUN P Z， SONG X G， et al. PolarMask： single shot instance segmentation with polar representation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 12190-12199. 10.1109/cvpr42600.2020.01221
12	RIAZ H U M， BENBARKA N， ZELL A. FourierNet： compact mask representation for instance segmentation using differentiable shape decoders［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 7833-7840. 10.1109/icpr48806.2021.9413048
13	BOLYA D， ZHOU C， XIAO F Y， et al. YOLACT： real-time instance segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9156-9165. 10.1109/iccv.2019.00925
14	CHEN H， SUN K Y， TIAN Z， et al. BlendMask： top-down meets bottom-up for instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：8570-8578. 10.1109/cvpr42600.2020.00860
15	TIAN Z， SHEN C H， CHEN H. Conditional convolutions for instance segmentation［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 282-298.
16	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
17	HU J， CAO L J， LU Y， et al. ISTR： end-to-end instance segmentation with Transformers［EB/OL］. （2021-05-06）［2022-01-24］.. 10.1109/cvpr46437.2021.00863
18	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-01-24］..
19	TOUVRON H， CORD M， DOUZE M， et al. Training data-efficient image transformers & distillation through attention［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10347-10357. 10.1109/iccv48922.2021.00091
20	HAN K， XIAO A， WU E H， et al. Transformer in Transformer［C/OL］// Proceedings of the 35th Conference on Neural Information Processing Systems. ［2022-01-26］..
21	TIAN Y X， NEWSAM S， BOAKYE K. Image search with text feedback by additive attention compositional learning［EB/OL］. （2022-03-08）［2022-03-20］.. 10.1109/wacv56688.2023.00107
22	KIM S W， MIN J， CHO M. Visual TransforMatcher： efficient match-to-match attention for visual correspondence［EB/OL］. （2021-10-06）［2022-01-29］.. 10.1109/cvpr52688.2022.00850
23	HONG S， CHO S， NAM J， et al. Cost aggregation is all you need for few-shot segmentation［EB/OL］. （2021-12-22）［2022-01-24］.. 10.1007/978-3-031-19818-2_7
24	HE K M， FAN H Q， WU Y X， et al. Momentum contrast for unsupervised visual representation learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：9726-9735. 10.1109/cvpr42600.2020.00975
25	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
26	CHEN X L， FAN H Q， GIRSHICK R， et al. Improved baselines with momentum contrastive learning［EB/OL］. （2020-03-09）［2022-01-24］..
27	CHEN X L， HE K M. Exploring simple Siamese representation learning［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15745-15753. 10.1109/cvpr46437.2021.01549
28	DENG Z L， ZHONG Y J， GUO S， et al. InsCLR： improving instance retrieval with self-supervision［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2022： 516-524. 10.1609/aaai.v36i1.19930
29	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
30	LIU S， JIA J Y， FIDLER S， et al. SGN： sequential grouping networks for instance segmentation［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3516-3524. 10.1109/iccv.2017.378
31	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
32	LIU S， QI L， QIN H F， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
33	ZHANG R F， TIAN Z， SHEN C H， et al. Mask encoding for single shot instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10223-10232. 10.1109/cvpr42600.2020.01024
34	PENG S D， JIANG W， PI H J， et al. Deep snake for real-time instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8530-8539. 10.1109/cvpr42600.2020.00856

[1]	Mengting GE, Minghua WAN. Feature extraction model based on neighbor supervised locally invariant robust principal component analysis [J]. Journal of Computer Applications, 2023, 43(4): 1013-1020.
[2]	You YANG, Ruhui ZHANG, Pengcheng XU, Kang KANG, Hao ZHAI. Improved U-Net for seal segmentation of Republican archives [J]. Journal of Computer Applications, 2023, 43(3): 943-948.
[3]	Haifeng LI, Fan ZHANG, Minnan PIAO, Huaichao WANG, Nansha LI, Zhongcheng GUI. Automatic detection of targets under airport pavement based on channel and spatial attention [J]. Journal of Computer Applications, 2023, 43(3): 930-935.
[4]	Qing JIA, Laihua WANG, Weisheng WANG. Anomaly detection in video via independently recurrent neural network and variational autoencoder network [J]. Journal of Computer Applications, 2023, 43(2): 507-513.
[5]	Xinyu ZHANG, Sheng DING, Zhipei YANG. Traffic sign detection algorithm based on improved attention mechanism [J]. Journal of Computer Applications, 2022, 42(8): 2378-2385.
[6]	Tingwei QIN, Pengcheng ZHAO, Pinle QIN, Jianchao ZENG, Rui CHAI, Yongqi HUANG. Point cloud registration algorithm based on residual attention mechanism [J]. Journal of Computer Applications, 2022, 42(7): 2184-2191.
[7]	Tianhao QIU, Shurong CHEN. EfficientNet based dual-branch multi-scale integrated learning for pedestrian re-identification [J]. Journal of Computer Applications, 2022, 42(7): 2065-2071.
[8]	Xingshuo DING, Xiang LI, Qian XIE. Enterprise portrait construction method based on label layering and deepening modeling [J]. Journal of Computer Applications, 2022, 42(4): 1170-1177.
[9]	Changqing JI, Zhiyong GAO, Jing QIN, Zumin WANG. Review of image classification algorithms based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1044-1049.
[10]	Ne LI, Guangzhu XU, Bangjun LEI, Guoliang MA, Yongtao SHI. Logo recognition algorithm for vehicles on traffic road [J]. Journal of Computer Applications, 2022, 42(3): 810-817.
[11]	Nan XIANG, Chuanzhong PAN, Gaoxiang YU. Object detection algorithm combined with optimized feature extraction structure [J]. Journal of Computer Applications, 2022, 42(11): 3558-3563.
[12]	Yu DU, Meng YAN, Xin WU. Non-intrusive load identification algorithm based on convolutional neural network with upsampling pyramid structure [J]. Journal of Computer Applications, 2022, 42(10): 3300-3306.
[13]	Yi ZHANG, Hua WAN, Shuqin TU. Technical review and case study on classification of Chinese herbal slices based on computer vision [J]. Journal of Computer Applications, 2022, 42(10): 3224-3234.
[14]	MA Jialiang, CHEN Bin, SUN Xiaofei. General object detection framework based on improved Faster R-CNN [J]. Journal of Computer Applications, 2021, 41(9): 2712-2719.
[15]	ZHENG Zhiqiang, HU Xin, WENG Zhi, WANG Yuhe, CHENG Xi. Cattle eye image feature extraction method based on improved DenseNet [J]. Journal of Computer Applications, 2021, 41(9): 2780-2784.

Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

基于Fastformer和自监督对比学习的实例分割算法

RichHTML

PDF

PDF (Mobile)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 34

Related Articles 15

Recommended Articles

Metrics