Light-weight road image semantic segmentation algorithm based on deep learning

doi:10.11772/j.issn.1001-9081.2020081181

Abstract

Abstract: In order to solve the problem that the road image semantic segmentation model has huge parameter number and complex calculation in deep learning, and is not suitable for deployment on mobile terminals for real-time segmentation, a light-weighted symmetric U-shaped encoder-decoder image semantic segmentation network constructed by depthwise separable convolution was introduced, namely MUNet. First, a U-shaped encoder-decoder network was designed; then, the sparse short connection design was added in the convolution blocks; at last, the attention mechanism and Group Normalization (GN) method were introduced to reduce the amount of model parameters and calculation while improving the segmentation accuracy. For the CamVid dataset of road images, after 1 000 rounds of training, the Mean Intersection over Union (MIoU) of the segmentation results of the MUNet was 61.92% when the test image was cropped to a size of 720×720. Experimental results show that compared with the common image semantic segmentation networks such as Pyramid Scene Parsing Network (PSPNet), RefineNet, Global Convolutional Network (GCN) and DeepLabv3+, MUNet has fewer parameters and calculation with better network segmentation performance.

Key words: deep learning, road image semantic segmentation, depthwise separable convolution, light-weighted neural network, attention mechanism

摘要： 针对深度学习中道路图像语义分割模型参数量巨大以及计算复杂，不适合于部署在移动端进行实时分割的问题，提出了一种使用深度可分离卷积构建的轻量级对称U型编码器-解码器式的图像语义分割网络MUNet。首先设计出U型编码器-解码器式网络；其次，在卷积块之间设计稀疏短连接；最后，引入了注意力机制与组归一化（GN）方法，从而在减少模型参数量以及计算量的同时提升分割精度。针对道路图像CamVid数据集，在1 000轮训练后，MUNet模型分割结果在测试图像裁剪为720×720大小时的平均交并比（MIoU）为61.92%。实验结果表明，和常见的图像语义分割网络如金字塔场景分析网络（PSPNet）、RefineNet、全局卷积网络（GCN）和DeepLabv3+相比较，MUNet的参数量以及计算量更少，同时网络分割性能更好。

关键词: 深度学习, 道路图像语义分割, 深度可分离卷积, 轻量级神经网络, 注意力机制

CLC Number:

TP183

HU Die, FENG Ziliang. Light-weight road image semantic segmentation algorithm based on deep learning[J]. Journal of Computer Applications, 2021, 41(5): 1326-1331.

胡嵽, 冯子亮. 基于深度学习的轻量级道路图像语义分割算法[J]. 计算机应用, 2021, 41(5): 1326-1331.

References

[1] GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S O,et al. A review on deep learning techniques applied to semantic segmentation[EB/OL].[2019-04-22]. https://arxiv.org/pdf/1704.06857.pdf.
[2] 张学涛. 基于深度学习的道路图像语义分割算法研究[D]. 济南:山东大学,2019:47-58.(ZHANG X T. Research on road image semantic segmentation algorithm based on deep learning[D]. Jinan:Shandong University,2019:47-58.)
[3] 王嫣然, 陈清亮, 吴俊君. 面向复杂环境的图像语义分割方法综述[J]. 计算机科学,2019,46(9):36-46.(WANG Y R,CHEN Q L,WU J J. Research on image semantic segmentation for complex environments[J]. Computer Science,2019,46(9):36-46.)
[4] IOFFE S,SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:448-456.
[5] WU Y,HE K. Group normalization[J]. International Journal of Computer Vision,2020,128(3):742-755.
[6] LONG J,SHELHAMER E,DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2015:3431-3440.
[7] RONNEBERGER O,FISCHER P,BROX T. U-Net:convolutional networks for biomedical image segmentation[C]//Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention, LNCS 9351. Cham:Springer, 2015:234-241.
[8] BADRINARAYANAN V,KENDALL A,CIPOLLA R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[9] LI H,XIONG P,FAN H,et al. DFANet:deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:9514-9523.
[10] ZHAO H,SHI J,QI X,et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6230-6239.
[11] CHEN L C,ZHU Y,PAPANDREOU G,et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer,2018:833-851.
[12] PENG C,ZHANG X,YU G,et al. Large Kernel matters-improve semantic segmentation by global convolutional network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:1743-1751.
[13] CHEN L C,PAPANDREOU G,SCHROFF F,et al. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2019-11-13]. https://arxiv.org/pdf/1706.05587.pdf.
[14] YU C,WANG J,PENG C,et al. Learning a discriminative feature network for semantic segmentation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:1857-1866.
[15] WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer, 2018:3-19.
[16] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:2048-2057.
[17] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing System. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[18] WANG F,JIANG M,QIAN C,et al. Residual attention network for image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6450-6458.
[19] ZHANG H, GOODFELLOW I, METAXAS D, et al. Selfattention generative adversarial network[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:7354-7363.
[20] FU J,LIU J,TIAN H,et al. Dual attention network for scene segmentation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:3141-3149.
[21] HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:7132-7141.
[22] WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[23] CAO Y,XU J,LIN S,et al. GCNet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:1971-1980.
[24] HOWARD A G,ZHU M,CHEN B,et al. MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2019-05-15]. https://arxiv.org/pdf/1704.04861.pdf.
[25] ZHANG X,ZHOU X,LIN M,et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6848-6856.
[26] MA N,ZHANG X,ZHENG H,et al. ShuffleNet V2:practical guidelines for efficient CNN architecture design[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11218. Cham:Springer,2018:122-138.
[27] ZHANG T, QI G J, XIAO B, et al. Interleaved group convolutions for deep neural networks[EB/OL].[2019-08-20]. https://arxiv.org/pdf/1707.02725.pdf.
[28] XIE G, WANG J, ZHANG T, et al. IGCV2:interleaved structured sparse convolutional neural networks[EB/OL].[2019-05-12]. https://arxiv.org/pdf/1804.06202.pdf.
[29] SUN K,LI M J,LIU D,et al. IGCV3:interleaved low-rank group convolutions for efficient deep neural networks[C]//Proceedings of the 2018 British Machine Vision Conference. Durham:BMVA Press,2018:No. 0330.
[30] SANDLER M,HAWARD A,ZHU M,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:4510-4520.
[31] HOWARD A,SANDLER M,CHEN B,et al. Searching for mobileNetV3[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:1314-1324.
[32] LUO J, WU J. An entropy-based pruning method for CNN compression[EB/OL].[2019-07-20]. https://arxiv.org/pdf/1706.05791.pdf.
[33] TAN M,LE Q V. EfficientNet:rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:6105-6114.
[34] 耿丽丽, 牛保宁. 深度神经网络模型压缩综述[J]. 计算机科学与探索,2020,14(9):1441-1455. (GENG L L, NIU B N. Summary of deep neural network model compression[J]. Journal of Frontiers of Computer Science and Technology, 2020,14(9):1441-1455.)
[35] HUANG G,LIU Z,VAN DER MAATEN L,et al. Densely connected convolutional networks[C]//Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:2261-2269.
[36] JÉGOU S,DROZDZAL M,VAZQUEZ D,et al. The one hundred layers tiramisu:fully convolutional DenseNets for semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE,2017:1175-1183.
[37] BROSTOW G J, SHOTTON J, FAUQUEUR J, et al. Segmentation and recognition using structure from motion point clouds[C]//Proceedings of the 2008 European Conference on Computer Vision. LNCS 5302. Berlin:Springer,2008:44-57.
[38] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[39] LIN G, MILAN A, SHEN C, et al. RefineNet:multi-path refinement networks for high-resolution semantic segmentation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:5168-5177.
[40] PASZKE A,CHAURASIA A,KIM S,et al. ENet:a deep neural network architecture for real-time semantic segmentation[EB/OL].[2020-03-07]. https://arxiv.org/pdf/1606.02147.pdf.