Abstract:Aiming at the problems of low accuracy of driving command prediction, bulky model structure and a large amount of information redundancy in existing end-to-end autonomous driving methods, a new end-to-end autonomous driving model based on deep visual attention neural network was proposed. In order to effectively extract features of autonomous driving scenes, a deep visual attention neural network, which is composed of the convolutional neural network, the visual attention layer and the long short-term memory network, was proposed by introducing a visual attention mechanism into the end-to-end autonomous driving model. The proposed model was able to effectively extract spatial and temporal features of driving scene images, focus on important information and reduce information redundancy for realizing the end-to-end autonomous driving that predicts driving commands from sequential images input by front-facing camera. The data from a simulated driving environment were used for training and testing. The root mean square errors of the proposed model for prediction of the steering angle in four scenes including country road, highway, tunnel and mountain road are 0.009 14, 0.009 48, 0.002 89 and 0.010 78 respectively, which are all lower than the results of the method proposed by NVIDIA and the method based on the deep cascaded neural network. Moreover, the proposed model has fewer network layers compared with the networks without the visual attention mechanism.
[1] BROGGI A,CERRI P,DEBATTISTI S,et al. PROUD-public road urban driverless-car test[J]. IEEE Transactions on Intelligent Transportation System,2015,16(6):3508-3519. [2] CHEN C,SEFF A,KORNHAUSER A,et al. DeepDriving:learning affordance for direct perception in autonomous driving[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:2722-2730. [3] LECUN Y L,BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324. [4] BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al. End to end learning for self-driving cars[EB/OL].[2019-02-23]. https://arxiv.org/pdf/1604.07316.pdf. [5] 白丽贇, 胡学敏, 宋昇, 等. 基于深度级联神经网络的自动驾驶运动规划模型[J]. 计算机应用,2019,39(10):2870-2875. (BAI L Y,HU X M,SONG S,et al. Motion planning model based on deep cascaded neural network for autonomous driving[J]. Journal of Computer Applications,2019,39(10):2870-2875.) [6] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780. [7] XU H,GAO Y,YU F,et al. End-to-end learning of driving models from large-scale video datasets[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3530-3538. [8] CHI L,MU Y. Deep steering:learning end-to-end driving model from spatial and temporal visual cues[EB/OL].[2018-08-12]. https://arxiv.org/pdf/1708.03798.pdf. [9] SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent,reinforcement learning for autonomous driving[EB/OL].[2018-10-11]. https://arxiv.org/pdf/1610.03295.pdf. [10] EL SALLAB A,ABDOU M,PEROT E,et al. Deep reinforcement learning framework for autonomous driving[EB/OL].[2019-01-10]. https://arxiv.org/pdf/1704.02532.pdf. [11] ITTI L,KOCH C. Computational modelling of visual attention[J]. Nature Reviews Neuroscience,2001,2(3):194-203. [12] MNIH V,HEESS N,GRAVES A,et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2204-2212. [13] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010. [14] LIANG J W,JIANG L,CAO L,et al. Focal visual-text attention for memex question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1893-1908. [15] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1502.03044v3.pdf. [16] LIN Z,FENG M W,DOS SANTOS C N,et al. A structured selfattentive sentence embedding[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1703.03130.pdf. [17] UNDERWOOD G. Visual attention and the transition from novice to advanced driver[J]. Ergonomics,2007,50(8):1235-1249. [18] 胡学敏, 易重辉, 陈钦, 等. 基于运动显著图的人群异常行为检测[J]. 计算机应用,2018,38(4):1164-1169.(HU X M,YI C H,CHEN Q,et al. Abnormal crowd behavior detection based on motion saliency map[J]. Journal of Computer Applications, 2018,38(4):1164-1169.) [19] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015, 518(7540):529-533. [20] WOJNA Z,GORBAN A N,LEE D S,et al. Attention-based extraction of structured information from street view imagery[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Piscataway:IEEE, 2017:844-850. [21] 张盼盼, 李其申, 杨词慧. 基于轻量级分组注意力模块的图像分类算法[J]. 计算机应用,2020,40(3):645-650. (ZHANG P P,LI Q S,YANG C H. Image classification algorithm based on lightweight group-wise attention module[J]. Journal of Computer Applications,2020,40(3):645-650.) [22] KINGMA D P,BA J L. Adam:a method for stochastic optimization[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1412.6980.pdf.