End-to-end autonomous driving model based on deep visual attention neural network

doi:10.11772/j.issn.1001-9081.2019112054

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (7): 1926-1931.DOI: 10.11772/j.issn.1001-9081.2019112054

• Artificial intelligence • Previous Articles Next Articles

End-to-end autonomous driving model based on deep visual attention neural network

HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li

School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China

Received:2019-12-04 Revised:2020-03-27 Online:2020-07-10 Published:2020-06-29
Supported by:
This work is partially supported by the Youth Program of National Natural Science Foundation of China (61806076), the Youth Program of the Hubei Provincial Natural Science Foundation (2018CFB158).

基于深度视觉注意神经网络的端到端自动驾驶模型

胡学敏, 童秀迟, 郭琳, 张若晗, 孔力

湖北大学计算机与信息工程学院, 武汉 430062

通讯作者: 郭琳
作者简介:胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:计算机视觉、机器学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习;郭琳(1978-),女,湖北随州人,副教授,博士,主要研究方向:图像处理、机器学习;张若晗(1997-),女,湖北襄阳人,硕士研究生,主要研究方向:深度学习;孔力(1995-),男,湖北咸宁人,硕士研究生,主要研究方向:计算机视觉。
基金资助:
国家自然科学基金青年基金资助项目（61806076）；湖北省自然科学基金青年项目（2018CFB158）

Abstract

Abstract: Aiming at the problems of low accuracy of driving command prediction, bulky model structure and a large amount of information redundancy in existing end-to-end autonomous driving methods, a new end-to-end autonomous driving model based on deep visual attention neural network was proposed. In order to effectively extract features of autonomous driving scenes, a deep visual attention neural network, which is composed of the convolutional neural network, the visual attention layer and the long short-term memory network, was proposed by introducing a visual attention mechanism into the end-to-end autonomous driving model. The proposed model was able to effectively extract spatial and temporal features of driving scene images, focus on important information and reduce information redundancy for realizing the end-to-end autonomous driving that predicts driving commands from sequential images input by front-facing camera. The data from a simulated driving environment were used for training and testing. The root mean square errors of the proposed model for prediction of the steering angle in four scenes including country road, highway, tunnel and mountain road are 0.009 14, 0.009 48, 0.002 89 and 0.010 78 respectively, which are all lower than the results of the method proposed by NVIDIA and the method based on the deep cascaded neural network. Moreover, the proposed model has fewer network layers compared with the networks without the visual attention mechanism.

Key words: autonomous driving, end-to-end, visual attention, convolutional neural network, long short-term memory network

摘要： 针对现有端到端自动驾驶方法中存在的驾驶指令预测不准确、模型结构体量大和信息冗余多等问题，提出一种新的基于深度视觉注意神经网络的端到端自动驾驶模型。为了更有效地提取自动驾驶场景的特征，在端到端自动驾驶模型中引入视觉注意力机制，将卷积神经网络、视觉注意层和长短期记忆网络进行融合，提出一种深度视觉注意神经网络。该网络模型能够有效提取驾驶场景图像的空间特征和时间特征，并关注重要信息且减少信息冗余，实现用前向摄像机输入的序列图像来预测驾驶指令的端到端自动驾驶。利用模拟驾驶环境的数据进行训练和测试，该模型在乡村路、高速路、隧道和山路四个场景中对方向盘转向角预测的均方根误差分别为0.009 14、0.009 48、0.002 89和0.010 78，均低于对比用的英伟达公司提出的方法和基于深度级联神经网络的方法；并且与未使用视觉注意力机制的网络相比，该模型具有更少的网络层数。

关键词: 自动驾驶, 端到端, 视觉注意力, 卷积神经网络, 长短期记忆网络

CLC Number:

TP391.4

HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li. End-to-end autonomous driving model based on deep visual attention neural network[J]. Journal of Computer Applications, 2020, 40(7): 1926-1931.

胡学敏, 童秀迟, 郭琳, 张若晗, 孔力. 基于深度视觉注意神经网络的端到端自动驾驶模型[J]. 计算机应用, 2020, 40(7): 1926-1931.

References

[1] BROGGI A,CERRI P,DEBATTISTI S,et al. PROUD-public road urban driverless-car test[J]. IEEE Transactions on Intelligent Transportation System,2015,16(6):3508-3519.
[2] CHEN C,SEFF A,KORNHAUSER A,et al. DeepDriving:learning affordance for direct perception in autonomous driving[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:2722-2730.
[3] LECUN Y L,BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.
[4] BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al. End to end learning for self-driving cars[EB/OL].[2019-02-23]. https://arxiv.org/pdf/1604.07316.pdf.
[5] 白丽贇, 胡学敏, 宋昇, 等. 基于深度级联神经网络的自动驾驶运动规划模型[J]. 计算机应用,2019,39(10):2870-2875. (BAI L Y,HU X M,SONG S,et al. Motion planning model based on deep cascaded neural network for autonomous driving[J]. Journal of Computer Applications,2019,39(10):2870-2875.)
[6] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
[7] XU H,GAO Y,YU F,et al. End-to-end learning of driving models from large-scale video datasets[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3530-3538.
[8] CHI L,MU Y. Deep steering:learning end-to-end driving model from spatial and temporal visual cues[EB/OL].[2018-08-12]. https://arxiv.org/pdf/1708.03798.pdf.
[9] SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent,reinforcement learning for autonomous driving[EB/OL].[2018-10-11]. https://arxiv.org/pdf/1610.03295.pdf.
[10] EL SALLAB A,ABDOU M,PEROT E,et al. Deep reinforcement learning framework for autonomous driving[EB/OL].[2019-01-10]. https://arxiv.org/pdf/1704.02532.pdf.
[11] ITTI L,KOCH C. Computational modelling of visual attention[J]. Nature Reviews Neuroscience,2001,2(3):194-203.
[12] MNIH V,HEESS N,GRAVES A,et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2204-2212.
[13] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[14] LIANG J W,JIANG L,CAO L,et al. Focal visual-text attention for memex question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1893-1908.
[15] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1502.03044v3.pdf.
[16] LIN Z,FENG M W,DOS SANTOS C N,et al. A structured selfattentive sentence embedding[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1703.03130.pdf.
[17] UNDERWOOD G. Visual attention and the transition from novice to advanced driver[J]. Ergonomics,2007,50(8):1235-1249.
[18] 胡学敏, 易重辉, 陈钦, 等. 基于运动显著图的人群异常行为检测[J]. 计算机应用,2018,38(4):1164-1169.(HU X M,YI C H,CHEN Q,et al. Abnormal crowd behavior detection based on motion saliency map[J]. Journal of Computer Applications, 2018,38(4):1164-1169.)
[19] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015, 518(7540):529-533.
[20] WOJNA Z,GORBAN A N,LEE D S,et al. Attention-based extraction of structured information from street view imagery[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Piscataway:IEEE, 2017:844-850.
[21] 张盼盼, 李其申, 杨词慧. 基于轻量级分组注意力模块的图像分类算法[J]. 计算机应用,2020,40(3):645-650. (ZHANG P P,LI Q S,YANG C H. Image classification algorithm based on lightweight group-wise attention module[J]. Journal of Computer Applications,2020,40(3):645-650.)
[22] KINGMA D P,BA J L. Adam:a method for stochastic optimization[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1412.6980.pdf.

End-to-end autonomous driving model based on deep visual attention neural network

基于深度视觉注意神经网络的端到端自动驾驶模型

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735.
[2]	LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509.
[3]	ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614.
[4]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[5]	XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725.
[6]	MOU Changning, WANG Haipeng, ZHOU Piyu, HOU Xinhang. De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network [J]. Journal of Computer Applications, 2021, 41(9): 2773-2779.
[7]	WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747.
[8]	SUN Xiao, XU Jindong. Remote sensing image dehazing method based on cascaded generative adversarial network [J]. Journal of Computer Applications, 2021, 41(8): 2440-2444.
[9]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[10]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[11]	DING Yin, SANG Nan, LI Xiaoyu, WU Feizhou. Prediction method of capacity data in telecom industry based on recurrent neural network [J]. Journal of Computer Applications, 2021, 41(8): 2373-2378.
[12]	QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337.
[13]	HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.
[14]	GAO Qinquan, HUANG Bingcheng, LIU Wenzhe, TONG Tong. Bamboo strip surface defect detection method based on improved CenterNet [J]. Journal of Computer Applications, 2021, 41(7): 1933-1938.
[15]	WU Guoliang, XU Jining. Chinese emergency event extraction method based on named entity recognition task feedback enhancement [J]. Journal of Computer Applications, 2021, 41(7): 1891-1896.