基于注意力机制和有效分解卷积的实时分割算法

doi:10.11772/j.issn.1001-9081.2021071327

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2659-2666.DOI: 10.11772/j.issn.1001-9081.2021071327

所属专题：人工智能

基于注意力机制和有效分解卷积的实时分割算法

文凯¹^,², 唐伟伟¹^,²(), 熊俊臣¹^,²

^1.重庆邮电大学通信与信息工程学院，重庆 400065
^2.重庆邮电大学通信新技术应用研究中心，重庆 400065

收稿日期:2021-07-23 修回日期:2021-10-14 接受日期:2021-10-18 发布日期:2021-10-29 出版日期:2022-09-10
通讯作者: 唐伟伟
作者简介:文凯（1972—），男，重庆人，高级工程师，博士，主要研究方向：大数据、计算机视觉、移动通信；
熊俊臣（1997—），男，四川达州人，硕士研究生，主要研究方向：图像语义分割、图像处理。

Real-time segmentation algorithm based on attention mechanism and effective factorized convolution

Kai WEN¹^,², Weiwei TANG¹^,²(), Junchen XIONG¹^,²

^1.School of Communications and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China
^2.Research Center of New Communication Technology Application，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2021-07-23 Revised:2021-10-14 Accepted:2021-10-18 Online:2021-10-29 Published:2022-09-10
Contact: Weiwei TANG
About author:WEN Kai， born in 1972， Ph. D.， senior engineer. His research interests include big data， computer vision， mobile communication.
XIONG Junchen， born in 1997， M. S. candidate， His research interests include semantic segmentation of images， image processing.

摘要/Abstract

摘要：

针对现阶段实时语义分割算法计算成本高和内存占用大而无法满足实际场景需求的问题，提出一种新型的浅层的轻量级实时语义分割算法——基于注意力机制和有效分解卷积的实时分割算法（AEFNet）。首先，利用一维非瓶颈结构（Non-bottleneck-1D）构建轻量级分解卷积模块以提取丰富的上下文信息并减少运算量，同时以一种简单的方式增强算法学习能力并利于提取细节信息；然后，结合池化操作和注意力细化模块（ARM）构建全局上下文注意力模块以捕捉全局信息并细化算法的每个阶段，从而优化分割效果。算法在公共数据集cityscapes和camvid上进行验证，并在cityscapes测试集上获得精度为74.0%和推理速度为118.9帧速率（FPS），相比深度非对称瓶颈网络（DABNet），所提算法在精度上提高了约4个百分点，推理速度提升了14.7 FPS，与最近高效的增强非对称卷积网络（EACNet）相比，所提算法精度略低0.2个百分点，然而推理速度提高了6.9 FPS。实验结果表明：所提算法能够较为准确地识别场景信息，并能满足实时性要求。

关键词: 分解卷积, 注意力机制, 空间细节信息, 上下文信息, 轻量级算法

Abstract:

The current real-time semantic segmentation algorithm has the high computational cost and large memory footprint， which cannot meet the applications requirements of actual scenes. In order to solve the problems， a new type of shallow lightweight real-time semantic segmentation algorithm — AEFNet （Real-time segmentation algorithm based on Attention mechanism and Effective Factorized convolution） was proposed. Firstly， one-dimensional non-bottleneck structure （Non-bottleneck-1D） was adopted to construct a lightweight factorized convolution module to extract rich contextual information and reduce the amount of calculation. At the same time， the learning ability of the algorithm was enhanced in a simple way and the extraction of detailed information was facilitated. Then， the pooling operation and Attention Refinement Module （ARM） were combined to construct a global context attention module to capture global information and refine each stage of the algorithm to optimize the segmentation effect. The algorithm was verified on the public datasets cityscapes and camvid， and the precision of 74.0% and the inference speed of 118.9 Frames Per Second （FPS） were obtained on the cityscapes test set. Compared with Depth-wise Asymmetric Bottleneck Network （DABNet）， the proposed algorithm has the precision increased by about 4 percentage points， and the inference speed increased by 14.7 FPS. Compared with the recent efficient Enhanced Asymmetric Convolution Network （EACNet）， the proposed algorithm has the precision slightly lower by 0.2 percentage points， but has the inference speed increased by 6.9 FPS. Experimental results show that the proposed algorithm can more accurately identify the scene information， and can meet the real-time requirements.

Key words: factorized convolution, attention mechanism, spatial detailed information, contextual information, lightweight algorithm

中图分类号:

TP183

文凯, 唐伟伟, 熊俊臣. 基于注意力机制和有效分解卷积的实时分割算法[J]. 计算机应用, 2022, 42(9): 2659-2666.

Kai WEN, Weiwei TANG, Junchen XIONG. Real-time segmentation algorithm based on attention mechanism and effective factorized convolution[J]. Journal of Computer Applications, 2022, 42(9): 2659-2666.

图/表 12

参考文献 28

1	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
2	BADRINARAYANAN V， KENDALL A， CIPOLLA R. SegNet： a deep convolutional encoder-decoder architecture for image segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（12）： 2481-2495. 10.1109/tpami.2016.2644615
3	ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239. 10.1109/cvpr.2017.660
4	LIN G S， MILAN A， SHEN C H， et al. RefineNet： multi-path refinement networks for high-resolution semantic segmentation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：5168-5177. 10.1109/cvpr.2017.549
5	PASZKE A， CHAURASIA A， KIM S， et al. ENet： a deep neural network architecture for real-time semantic segmentation［EB/OL］. （2016-06-07）［2021-07-15］.. 10.48550/arXiv.1606.02147
6	EMARA T， MUNIM H E ABD EL， ABBAS H M. LiteSeg： a novel lightweight ConvNet for semantic segmentation［C］// Proceedings of 2019 Digital Image Computing： Techniques and Applications. Piscataway： IEEE， 2019： 1-7. 10.1109/dicta47822.2019.8945975
7	LI G， KIM J. DABNet： depth-wise asymmetric bottleneck for real-time semantic segmentation［C］// Proceedings of the 2019 British Machine Vision Conference. Durham： BMVA Press， 2019： No.186. 10.1109/access.2020.2971760
8	LI Y Q， LI X K， XIAO C J， et al. EACNet： enhanced asymmetric convolution for real-time semantic segmentation［J］. IEEE Signal Processing Letters， 2021， 28： 234-238. 10.1109/lsp.2021.3051845
9	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. （2017-04-17）［2021-08-15］.. 10.48550/arXiv.1704.04861
10	ROMERA E， ÁLVAREZ J M， BERGASA L M， et al. ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（1）： 263-272. 10.1109/tits.2017.2750080
11	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［EB/OL］. （2017-12-05）［2021-08-06］.. 10.1007/978-3-030-01234-2_49
12	MEHTA S， RASTEGARI M， CASPI A， et al. ESPNet： efficient spatial pyramid of dilated convolutions for semantic segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11214. Cham： Springer， 2018： 561-580.
13	YU C Q， WANG J B， PENG C， et al. BiSeNet： bilateral segmentation network for real-time semantic segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11217. Cham： Springer， 2018： 334-349.
14	FU J， LIU J， TIAN H J， et al. Dual attention network for scene segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3141-3149. 10.1109/cvpr.2019.00326
15	HU P， PERAZZI F， HEILBRON F C， et al. Real-time semantic segmentation with fast attention［J］. IEEE Robotics and Automation Letters， 2021， 6（1）： 263-270. 10.1109/lra.2020.3039744
16	ZHOU W J， YUAN J Z， LEI J S， et al. TSNet： three-stream self-attention network for RGB-D indoor semantic segmentation［J］. IEEE Intelligent Systems， 2021， 36（4）： 73-78. 10.1109/mis.2020.2999462
17	HAN K， WANG Y H， TIAN Q， et al. GhostNet： more features from cheap operations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1577-1586. 10.1109/cvpr42600.2020.00165
18	ZHAO H S， QI X J， SHEN X Y， et al. ICNet for real-time semantic segmentation on high-resolution images［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11207. Cham： Springer， 2018： 418-434.
19	ZHOU Q， WANG Y， FAN Y W， et al. AGLNet： towards real-time semantic segmentation of self-driving images via attention-guided lightweight network［J］. Applied Soft Computing， 2020， 96： No.106682. 10.1016/j.asoc.2020.106682
20	WANG Y， ZHOU Q， XIONG J， et al. ESNet： an efficient symmetric network for real-time semantic segmentation［C］// Proceedings of the 2019 Chinese Conference on Pattern Recognition and Computer Vision， LNCS 11858. Cham： Springer， 2019： 41-52.
21	LI H C， XIONG P F， FAN H Q， et al. DFANet： deep feature aggregation for real-time semantic segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9514-9523. 10.1109/cvpr.2019.00975
22	JIANG W H， XIE Z Z， LI Y Y， et al. LRNNet： a light-weighted network with efficient reduced non-local operation for real-time semantic segmentation［C］// Proceedings of 2020 IEEE International Conference on Multimedia and Expo Workshops. Piscataway： IEEE， 2020： 1-6. 10.1109/icmew46912.2020.9106038
23	HU X G， WANG H B. Efficient fast semantic segmentation using continuous shuffle dilated convolutions［J］. IEEE Access， 2020， 8：70913-70924. 10.1109/access.2020.2987080
24	WU T Y， TANG S， ZHANG R， et al. CGNet： a light-weight context guided network for semantic segmentation［J］. IEEE Transactions on Image Processing， 2021， 30：1169-1179. 10.1109/tip.2020.3042065
25	LO S Y， HANG H M， CHAN S W， et al. Efficient dense modules of asymmetric convolution for real-time semantic segmentation［C］// Proceedings of the 2019 ACM International Conference on Multimedia in Asia. New York： ACM， 2019： No.1. 10.1145/3338533.3366558
26	高世伟，张长柱，王祝萍. 基于可分离金字塔的轻量级实时语义分割算法［J］. 计算机应用， 2021， 41（10）： 2937-2944. 10.11772/j.issn.1001-9081.2020121939
	GAO S W， ZHANG C Z， WANG Z P. Lightweight real-time semantic segmentation algorithm based on separable pyramid［J］. Journal of Computer Applications， 2021， 41（10）： 2937-2944. 10.11772/j.issn.1001-9081.2020121939
27	秦飞巍，沈希乐，彭勇，等. 无人驾驶中的场景实时语义分割方法［J］. 计算机辅助设计与图形学学报， 2021， 33（7）：1026-1037. 10.3724/SP.J.1089.2021.18631
	QIN F W， SHEN X Y， PENG Y， et al. A real-time semantic segmentation approach for autonomous driving scenes［J］. Journal of Computer-Aided Design and Graphics， 2021， 33（7）： 1026-1037. 10.3724/SP.J.1089.2021.18631
28	胡嵽，冯子亮. 基于深度学习的轻量级道路图像语义分割算法［J］. 计算机应用， 2021， 41（5）：1326-1331. 10.11772/j.issn.1001-9081.2020081181
	HU D， FENG Z L. Light-weight road image semantic segmentation algorithm based on deep learning［J］. Journal of Computer Applications， 2021， 41（5）： 1326-1331. 10.11772/j.issn.1001-9081.2020081181

模块	mIOU/%	FPS
AEFNet	74.0	118.9
AEFNet（r=4，4，4，4，4，4）	72.7	118.7
AEFNet（r=3，3，7，7，13，13）	73.3	118.2
AEFNet+ERFNetdecoder	74.3	69.7

模块	mIOU/%	FPS
AEFNet	74.0	118.9
AEFNet（r=4，4，4，4，4，4）	72.7	118.7
AEFNet（r=3，3，7，7，13，13）	73.3	118.2
AEFNet+ERFNetdecoder	74.3	69.7

GCAM		mIOU/%	FPS
Avg-pooling+AM	Max-pooling+AM	mIOU/%	FPS
√		73.1	119.5
	√	73.5	119.6
		72.6	125.6
√	√	74.0	118.9

GCAM		mIOU/%	FPS
Avg-pooling+AM	Max-pooling+AM	mIOU/%	FPS
√		73.1	119.5
	√	73.5	119.6
		72.6	125.6
√	√	74.0	118.9

模块	长连接			mIOU/%	FPS
GCAM	阶段一	阶段二	阶段三	mIOU/%	FPS
√		√	√	72.9	119.2
√	√		√	73.6	117.3
√	√	√		73.2	116.4
√				73.8	119.3

基于注意力机制和有效分解卷积的实时分割算法

Real-time segmentation algorithm based on attention mechanism and effective factorized convolution

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 28

相关文章 15

编辑推荐

Metrics

算法	输入图片尺寸	预训练策略	参数量/MB	精度/%	FPS
PSPNet^［3］	713×713	有	65.70	78.4	<1
SegNet^［2］	360×640	有	29.50	56.1	14.6
ENet^［5］	512×1024	无	0.36	58.3	76.9
ESPNet^［12］	512×1024	无	0.36	60.3	112.0
ERFNet^［10］	512×1024	无	2.10	68.0	41.7
ICNet^［18］	1 024×2 048	有	7.80	69.5	30.3
AGLNet^［19］	512×1 024	无	1.12	70.1	52.0
DABNet^［7］	512×1 024	无	0.76	70.1	104.2
ESNet^［20］	512×1 024	有	1.66	70.7	63.0
DFANet^［21］	1 024×1 024	有	7.80	71.3	100.0
LRNNet^［22］	512×1 024	无	0.68	72.2	71.0
EACNet^［8］	512×1 024	无	1.10	74.2	113.0
AEFNet	512×1 024	无	1.59	74.0	118.9

测试目标	算法模型
测试目标	ENet	EFSNet	CGNet	ERFNet	DABNet	AEFNet
道路	96.3	96.6	95.5	97.9	96.8	98.0
人行道	74.2	74.9	78.7	82.1	78.5	84.4
建筑物	75.0	86.4	88.1	90.7	90.9	91.7
墙壁	32.2	37.5	40.0	45.2	45.3	48.8
栅栏	33.2	39.6	43.0	50.4	50.1	58.1
电杆	43.4	48.0	54.1	59.0	59.1	63.0
交通灯	34.1	49.8	59.8	62.6	65.2	67.7
交通标志	44.0	55.1	63.9	68.4	70.7	75.5
植物	88.6	89.7	89.6	91.9	92.5	92.3
地势	61.4	64.9	67.6	69.4	68.1	68.9
天空	90.6	92.8	92.9	94.2	94.6	93.9
行人	65.5	70.3	74.9	78.5	80.5	80.6
骑手	38.4	51.5	54.9	59.8	58.5	61.4
汽车	90.6	90.2	90.2	93.4	92.7	93.9
卡车	36.9	43.0	44.1	52.3	52.7	65.4
公交车	50.5	49.7	59.5	60.8	67.2	78.1
拖车	48.1	41.6	25.2	53.7	50.9	52.7
摩托车	38.8	41.5	47.3	49.9	50.4	57.6
自行车	55.4	53.5	60.2	64.2	65.7	74.4
类mIOU	58.3	61.9	64.8	69.7	70.1	74.0
类别mIOU	80.4	82.8	85.7	87.3	87.8	88.6

算法	输入图片尺寸	精度/%	帧速率/FPS	参数量/MB
ENet^［5］	360×480	51.3	61.0	0.36
Segnet^［2］	360×480	55.6	16.7	29.50
ESPNet^［12］	360×480	55.6	132.0	0.36
EDANet^［25］	360×480	66.4	163.0	0.68
DABNet^［7］	360×480	66.4	117.0	0.76
ICNet^［18］	720×960	67.1	30.3	7.80
DFANet^［21］	360×480	71.3	100.0	7.80
AEFNet	360×480	67.6	123.6	1.59

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[5]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[6]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[7]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[8]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[9]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[10]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[11]	毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025.
[12]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[15]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.