Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

doi:10.11772/j.issn.1001-9081.2022020270

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1062-1070.DOI: 10.11772/j.issn.1001-9081.2022020270

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

Rong GAO¹^,²(), Jiawei SHEN¹, Xiongkai SHAO¹, Xinyun WU¹

^1.School of Computer Science，Hubei University of Technology，Wuhan Hubei 430068，China
^2.State Key Laboratory for Novel Software Technology （Nanjing University），Nanjing Jiangsu 210023，China

Received:2022-03-09 Revised:2022-05-20 Accepted:2022-05-20 Online:2022-08-16 Published:2023-04-10
Contact: Rong GAO
About author:SHEN Jiawei， born in 1998， M. S. candidate. His research interests include object detection， instance segmentation.
SHAO Xiongkai， born in 1963， Ph. D.， professor. His research interests include machine learning， image processing.
WU Xinyun， born in 1987， Ph. D.， associate professor. His research interests include algorithm for solving combinatorial optimization problems.
Supported by:
National Natural Science Foundation of China(61902116);Open Project of State Key Laboratory for Novel Software Technology in Nanjing University(KFKT2021B12);Hubei Provincial High-level Talent Fund(GCRC2020011);Doctoral Research Start-Up Fund of Hubei University of Technology(BSQD2019026)

基于Fastformer和自监督对比学习的实例分割算法

高榕¹^,²(), 沈加伟¹, 邵雄凯¹, 吴歆韵¹

^1.湖北工业大学计算机学院，武汉 430068
^2.计算机软件新技术国家重点实验室（南京大学），南京 210023

通讯作者: 高榕
作者简介:沈加伟（1998—），男，湖北黄冈人，硕士研究生，主要研究方向：目标检测、实例分割；
邵雄凯（1963—），男，湖北黄冈人，教授，博士，CCF会员，主要研究方向：机器学习、图像处理；
吴歆韵（1987—），男，湖北宜昌人，副教授，博士，主要研究方向：组合优化问题求解算法。
基金资助:
国家自然科学基金资助项目(61902116);南京大学计算机软件新技术国家重点实验室开放课题(KFKT2021B12);湖北省高层次人才基金资助项目(GCRC2020011);湖北工业大学博士科研启动基金资助项目(BSQD2019026)

Abstract

Abstract:

To address problems of low detection precision， coarse masks and weak generalization ability of the existing instance segmentation algorithms for occluded and blurred instances， an instance segmentation algorithm based on Fastformer and self-supervised contrastive learning was proposed. Firstly， in order to enhance the ability of algorithm to extract global information of feature maps， the Fastformer module based on additive attention was added after feature extraction network， and interrelationship between pixels in each layer of feature map was modeled deeply. Secondly， inspired by self-supervised learning， a self-supervised contrastive learning module was added to conduct self-supervised contrastive learning to instances in images to enhance the ability of algorithm to understand images， thereby improving segmentation results in environments with much noise interference. Experimental results show that the proposed algorithm has the mean Average Precision （mAP） improved by 3.1 and 2.5 percentage points respectively， compared to recently classical instance segmentation algorithm SOLOv2（Segmenting Objects by LOcations v2） on Cityscapes dataset and COCO2017 dataset. And a great balance is achieved between real-time performance and precision by the proposed algorithm， leading good robustness in segmentation instance of complex scenes.

Key words: instance segmentation, feature extraction, Fastformer, addictive attention, self-supervised contrastive learning

摘要：

针对现有的实例分割算法对有遮挡以及模糊实例检测精度低、掩码较粗糙以及泛化能力弱的问题，提出一种基于Fastformer和自监督对比学习的实例分割算法。首先，在特征提取网络之后加入基于加性注意力的Fastformer模块，并对每一层特征图中的像素点之间的相互关系进行深入建模，以提高算法对特征图全局信息的提取能力；其次，受自监督学习启发，加入自监督对比学习模块对图像中的实例进行自监督对比学习，以提高算法对图像的理解能力，从而改善在噪声干扰较多的环境下的分割效果。在Cityscapes和COCO2017数据集上的实验结果表明，相较于近期经典的实例分割算法SOLOv2（Segmenting Objects by LOcations v2），所提算法的平均精度均值（mAP）分别提高了3.1和2.5个百分点，并在实时性和精度之间达到较好的平衡，在比较复杂的场景实例分割中具有较好的鲁棒性。

关键词: 实例分割, 特征提取, Fastformer, 加性注意力, 自监督对比学习

CLC Number:

T391.4

Rong GAO, Jiawei SHEN, Xiongkai SHAO, Xinyun WU. Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning[J]. Journal of Computer Applications, 2023, 43(4): 1062-1070.

高榕, 沈加伟, 邵雄凯, 吴歆韵. 基于Fastformer和自监督对比学习的实例分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1062-1070.

Figures/Tables 13

Fig. 1 Instance segmentation results of SOLOv2 in different scenes

Fig. 2 Overall architecture of the proposed algorithm

Fig. 3 Input image and simulation result of pre-processed image

Fig. 4 Simulation results of feature maps by backbone network

Fig. 5 Fastformer module

Fig. 6 Simulation results of input and output feature maps of Fastformer

Fig. 7 Three branch modules of prediction network

Tab. 1 Comparison of experimental results on Cityscapes dataset

算法	速率/（frame·s^-1）	mAP/%	AP₅₀/%	不同类别的AP/%
算法	速率/（frame·s^-1）	mAP/%	AP₅₀/%	person	rider	car	bus
SGN	0.61	25.0	44.9	21.8	20.1	39.4	33.2
Mask R-CNN	0.72	25.7	45.2	20.6	23.2	40.5	32.7
MEInst	1.34	26.4	47.1	23.7	24.9	42.7	33.1
PANet	2.24	26.2	49.9	30.5	23.7	46.9	32.2
CondInst	0.84	31.8	57.1	36.8	30.4	54.8	36.3
Deep snake	4.25	32.4	56.9	37.0	31.9	56.8	38.6
BlendMask	4.67	31.7	58.4	37.2	27.0	56.0	40.5
SOLOv2	4.76	32.6	59.0	35.4	30.6	56.9	41.5
本文算法	4.49	35.7	62.3	36.9	32.1	60.3	44.7

Fig. 8 Instance segmentation results on Cityscapes dataset

Tab. 2 Comparison of experimental results on COCO2017 dataset

算法	速率/（frame·s^-1）	$m A P$ /%	AP/%
算法	速率/（frame·s^-1）	$m A P$ /%	小目标	中目标	大目标
SGN	10.2	35.7	19.8	38.7	47.2
Mask R-CNN	15.3	37.5	21.1	39.6	48.3
MEInst	15.0	33.5	19.3	35.7	42.1
PANet	14.6	32.7	20.1	36.8	44.5
CondInst	15.4	37.8	21.0	40.3	48.7
Deep snake	14.2	38.0	20.8	41.8	52.3
BlendMask	15.0	37.8	18.8	40.9	53.6
SOLOv2	13.5	38.2	17.6	41.2	55.4
本文算法	12.7	40.7	21.3	43.9	57.5

Tab. 2 Comparison of experimental results on COCO2017 dataset

算法	速率/（frame·s^-1）	$m A P$ /%	AP/%
算法	速率/（frame·s^-1）	$m A P$ /%	小目标	中目标	大目标
SGN	10.2	35.7	19.8	38.7	47.2
Mask R-CNN	15.3	37.5	21.1	39.6	48.3
MEInst	15.0	33.5	19.3	35.7	42.1
PANet	14.6	32.7	20.1	36.8	44.5
CondInst	15.4	37.8	21.0	40.3	48.7
Deep snake	14.2	38.0	20.8	41.8	52.3
BlendMask	15.0	37.8	18.8	40.9	53.6
SOLOv2	13.5	38.2	17.6	41.2	55.4
本文算法	12.7	40.7	21.3	43.9	57.5

Fig. 9 Instance segmentation results on COCO2017 dataset

Fig. 10 Segmentation effect display of instance segmentation results trained on COCO2017 dataset applying on Cityscapes dataset

Tab. 3 Experimental results of module analysis

算法	$m A P$ /%	$A P 50$ /%	速率/（frame·s^-1）
Baseline	32.6	59.0	4.76
Baseline+Fast	34.5	61.2	4.52
Baseline+Fast+Cont	35.7	62.3	4.49

Tab. 3 Experimental results of module analysis

算法	$m A P$ /%	$A P 50$ /%	速率/（frame·s^-1）
Baseline	32.6	59.0	4.76
Baseline+Fast	34.5	61.2	4.52
Baseline+Fast+Cont	35.7	62.3	4.49

References 34

1	WU D H， LV S C， JIANG M， et al. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments［J］. Computers and Electronics in Agriculture， 2020， 178： No.105742. 10.1016/j.compag.2020.105742
2	马佳良，陈斌，孙晓飞. 基于改进的Faster R-CNN的通用目标检测框架［J］. 计算机应用， 2021， 41（9）：2712-2719. 10.11772/j.issn.1001-9081.2020111852
	MA J L， CHEN B， SUN X F. General object detection framework based on improved Faster R-CNN［J］. Journal of Computer Applications， 2021， 41（9）： 2712-2719. 10.11772/j.issn.1001-9081.2020111852
3	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
4	WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460. 10.1109/wacv.2018.00163
5	杨贞，彭小宝，朱强强，等. 基于Deeplab V3 Plus的自适应注意力机制图像分割算法［J］. 计算机应用， 2022， 42（1）：230-238. 10.11772/j.issn.1001-9081.2021010137
	YANG Z， PENG X B， ZHU Q Q， et al. Image segmentation algorithm with adaptive attention mechanism based on Deeplab V3 Plus［J］. Journal of Computer Applications， 2022， 42（1）： 230-238. 10.11772/j.issn.1001-9081.2021010137
6	WANG X L， KONG T， SHEN C H， et al. SOLO： segmenting objects by locations［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12363. Cham： Springer， 2020： 649-665.
7	WANG X L， ZHANG R F， KONG T， et al. SOLOv2： dynamic and fast instance segmentation［C/OL］// Proceedings of the 34th Conference on Neural Information Processing System. ［2022-01-23］..
8	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223. 10.1109/cvpr.2016.350
9	WU C H， WU F Z， QI T， et al. Fastformer： additive attention can be all you need［EB/OL］. （2021-09-05）［2022-01-23］..
10	ZHOU X Y， ZHUO J C， KRÄHENBÜHL P. Bottom-up object detection by grouping extreme and center points［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 850-859. 10.1109/cvpr.2019.00094
11	XIE E Z， SUN P Z， SONG X G， et al. PolarMask： single shot instance segmentation with polar representation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 12190-12199. 10.1109/cvpr42600.2020.01221
12	RIAZ H U M， BENBARKA N， ZELL A. FourierNet： compact mask representation for instance segmentation using differentiable shape decoders［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 7833-7840. 10.1109/icpr48806.2021.9413048
13	BOLYA D， ZHOU C， XIAO F Y， et al. YOLACT： real-time instance segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9156-9165. 10.1109/iccv.2019.00925
14	CHEN H， SUN K Y， TIAN Z， et al. BlendMask： top-down meets bottom-up for instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：8570-8578. 10.1109/cvpr42600.2020.00860
15	TIAN Z， SHEN C H， CHEN H. Conditional convolutions for instance segmentation［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 282-298.
16	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
17	HU J， CAO L J， LU Y， et al. ISTR： end-to-end instance segmentation with Transformers［EB/OL］. （2021-05-06）［2022-01-24］.. 10.1109/cvpr46437.2021.00863
18	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-01-24］..
19	TOUVRON H， CORD M， DOUZE M， et al. Training data-efficient image transformers & distillation through attention［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10347-10357. 10.1109/iccv48922.2021.00091
20	HAN K， XIAO A， WU E H， et al. Transformer in Transformer［C/OL］// Proceedings of the 35th Conference on Neural Information Processing Systems. ［2022-01-26］..
21	TIAN Y X， NEWSAM S， BOAKYE K. Image search with text feedback by additive attention compositional learning［EB/OL］. （2022-03-08）［2022-03-20］.. 10.1109/wacv56688.2023.00107
22	KIM S W， MIN J， CHO M. Visual TransforMatcher： efficient match-to-match attention for visual correspondence［EB/OL］. （2021-10-06）［2022-01-29］.. 10.1109/cvpr52688.2022.00850
23	HONG S， CHO S， NAM J， et al. Cost aggregation is all you need for few-shot segmentation［EB/OL］. （2021-12-22）［2022-01-24］.. 10.1007/978-3-031-19818-2_7
24	HE K M， FAN H Q， WU Y X， et al. Momentum contrast for unsupervised visual representation learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：9726-9735. 10.1109/cvpr42600.2020.00975
25	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
26	CHEN X L， FAN H Q， GIRSHICK R， et al. Improved baselines with momentum contrastive learning［EB/OL］. （2020-03-09）［2022-01-24］..
27	CHEN X L， HE K M. Exploring simple Siamese representation learning［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15745-15753. 10.1109/cvpr46437.2021.01549
28	DENG Z L， ZHONG Y J， GUO S， et al. InsCLR： improving instance retrieval with self-supervision［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2022： 516-524. 10.1609/aaai.v36i1.19930
29	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
30	LIU S， JIA J Y， FIDLER S， et al. SGN： sequential grouping networks for instance segmentation［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3516-3524. 10.1109/iccv.2017.378
31	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
32	LIU S， QI L， QIN H F， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
33	ZHANG R F， TIAN Z， SHEN C H， et al. Mask encoding for single shot instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10223-10232. 10.1109/cvpr42600.2020.01024
34	PENG S D， JIANG W， PI H J， et al. Deep snake for real-time instance segmentation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8530-8539. 10.1109/cvpr42600.2020.00856

[1]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[2]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[3]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[4]	Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270.
[5]	Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257.
[6]	Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.
[7]	Zhihao WU, Ziqiu CHI, Ting XIAO, Zhe WANG. Meta-learning adaption for few-shot text-to-speech [J]. Journal of Computer Applications, 2024, 44(5): 1629-1635.
[8]	Cunyi LIAO, Yi ZHENG, Weijin LIU, Huan YU, Shouyin LIU. Decoupling-fusing algorithm for multiple tasks with autonomous driving environment perception [J]. Journal of Computer Applications, 2024, 44(2): 424-431.
[9]	Chenhui CUI, Suzhen LIN, Dawei LI, Xiaofei LU, Jie WU. Infrared dim small target tracking method based on Siamese network and Transformer [J]. Journal of Computer Applications, 2024, 44(2): 563-571.
[10]	Wenjie YAN, Dongyue DANG. Broad quantum state tomography model based on adaptive feature extraction [J]. Journal of Computer Applications, 2024, 44(12): 3861-3866.
[11]	Tao LIU, Shihong JU, Yimeng GAO. Small object detection algorithm from drone perspective based on improved YOLOv8n [J]. Journal of Computer Applications, 2024, 44(11): 3603-3609.
[12]	Yiyang FAN, Yang ZHANG, Shang ZENG, Yu ZENG, Maoli FU. Multivariate long-term series forecasting model based on decomposition and frequency domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3442-3448.
[13]	Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG. Multivariate time series anomaly detection based on multi-domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3419-3426.
[14]	Xiaoyu HUA, Dongfen LI, You FU, Kejun BI, Shi YING, Ruijin WANG. Industrial chain risk assessment and early warning model combining hierarchical graph neural network and long short-term memory [J]. Journal of Computer Applications, 2024, 44(10): 3223-3231.
[15]	Yuning ZHANG, Abudukelimu ABULIZI, Tisheng MEI, Chun XU, Maierdana MAIMAITIREYIMU, Halidanmu ABUDUKELIMU, Yutao HOU. Anomaly detection method for skeletal X-ray images based on self-supervised feature extraction [J]. Journal of Computer Applications, 2024, 44(1): 175-181.

Instance segmentation algorithm based on Fastformer and self-supervised contrastive learning

基于Fastformer和自监督对比学习的实例分割算法

RichHTML

PDF

PDF (Mobile)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 34

Related Articles 15

Recommended Articles

Metrics