基于深度视觉注意神经网络的端到端自动驾驶模型

doi:10.11772/j.issn.1001-9081.2019112054

计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1926-1931.DOI: 10.11772/j.issn.1001-9081.2019112054

基于深度视觉注意神经网络的端到端自动驾驶模型

胡学敏, 童秀迟, 郭琳, 张若晗, 孔力

湖北大学计算机与信息工程学院, 武汉 430062

收稿日期:2019-12-04 修回日期:2020-03-27 出版日期:2020-07-10 发布日期:2020-06-29
通讯作者: 郭琳
作者简介:胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:计算机视觉、机器学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习;郭琳(1978-),女,湖北随州人,副教授,博士,主要研究方向:图像处理、机器学习;张若晗(1997-),女,湖北襄阳人,硕士研究生,主要研究方向:深度学习;孔力(1995-),男,湖北咸宁人,硕士研究生,主要研究方向:计算机视觉。
基金资助:
国家自然科学基金青年基金资助项目（61806076）；湖北省自然科学基金青年项目（2018CFB158）

End-to-end autonomous driving model based on deep visual attention neural network

HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li

School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China

Received:2019-12-04 Revised:2020-03-27 Online:2020-07-10 Published:2020-06-29
Supported by:
This work is partially supported by the Youth Program of National Natural Science Foundation of China (61806076), the Youth Program of the Hubei Provincial Natural Science Foundation (2018CFB158).

摘要/Abstract

摘要： 针对现有端到端自动驾驶方法中存在的驾驶指令预测不准确、模型结构体量大和信息冗余多等问题，提出一种新的基于深度视觉注意神经网络的端到端自动驾驶模型。为了更有效地提取自动驾驶场景的特征，在端到端自动驾驶模型中引入视觉注意力机制，将卷积神经网络、视觉注意层和长短期记忆网络进行融合，提出一种深度视觉注意神经网络。该网络模型能够有效提取驾驶场景图像的空间特征和时间特征，并关注重要信息且减少信息冗余，实现用前向摄像机输入的序列图像来预测驾驶指令的端到端自动驾驶。利用模拟驾驶环境的数据进行训练和测试，该模型在乡村路、高速路、隧道和山路四个场景中对方向盘转向角预测的均方根误差分别为0.009 14、0.009 48、0.002 89和0.010 78，均低于对比用的英伟达公司提出的方法和基于深度级联神经网络的方法；并且与未使用视觉注意力机制的网络相比，该模型具有更少的网络层数。

关键词: 自动驾驶, 端到端, 视觉注意力, 卷积神经网络, 长短期记忆网络

Abstract: Aiming at the problems of low accuracy of driving command prediction, bulky model structure and a large amount of information redundancy in existing end-to-end autonomous driving methods, a new end-to-end autonomous driving model based on deep visual attention neural network was proposed. In order to effectively extract features of autonomous driving scenes, a deep visual attention neural network, which is composed of the convolutional neural network, the visual attention layer and the long short-term memory network, was proposed by introducing a visual attention mechanism into the end-to-end autonomous driving model. The proposed model was able to effectively extract spatial and temporal features of driving scene images, focus on important information and reduce information redundancy for realizing the end-to-end autonomous driving that predicts driving commands from sequential images input by front-facing camera. The data from a simulated driving environment were used for training and testing. The root mean square errors of the proposed model for prediction of the steering angle in four scenes including country road, highway, tunnel and mountain road are 0.009 14, 0.009 48, 0.002 89 and 0.010 78 respectively, which are all lower than the results of the method proposed by NVIDIA and the method based on the deep cascaded neural network. Moreover, the proposed model has fewer network layers compared with the networks without the visual attention mechanism.

Key words: autonomous driving, end-to-end, visual attention, convolutional neural network, long short-term memory network

中图分类号:

TP391.4

胡学敏, 童秀迟, 郭琳, 张若晗, 孔力. 基于深度视觉注意神经网络的端到端自动驾驶模型[J]. 计算机应用, 2020, 40(7): 1926-1931.

HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li. End-to-end autonomous driving model based on deep visual attention neural network[J]. Journal of Computer Applications, 2020, 40(7): 1926-1931.

参考文献

[1] BROGGI A,CERRI P,DEBATTISTI S,et al. PROUD-public road urban driverless-car test[J]. IEEE Transactions on Intelligent Transportation System,2015,16(6):3508-3519.
[2] CHEN C,SEFF A,KORNHAUSER A,et al. DeepDriving:learning affordance for direct perception in autonomous driving[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:2722-2730.
[3] LECUN Y L,BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.
[4] BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al. End to end learning for self-driving cars[EB/OL].[2019-02-23]. https://arxiv.org/pdf/1604.07316.pdf.
[5] 白丽贇, 胡学敏, 宋昇, 等. 基于深度级联神经网络的自动驾驶运动规划模型[J]. 计算机应用,2019,39(10):2870-2875. (BAI L Y,HU X M,SONG S,et al. Motion planning model based on deep cascaded neural network for autonomous driving[J]. Journal of Computer Applications,2019,39(10):2870-2875.)
[6] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
[7] XU H,GAO Y,YU F,et al. End-to-end learning of driving models from large-scale video datasets[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:3530-3538.
[8] CHI L,MU Y. Deep steering:learning end-to-end driving model from spatial and temporal visual cues[EB/OL].[2018-08-12]. https://arxiv.org/pdf/1708.03798.pdf.
[9] SHALEV-SHWARTZ S, SHAMMAH S, SHASHUA A. Safe, multi-agent,reinforcement learning for autonomous driving[EB/OL].[2018-10-11]. https://arxiv.org/pdf/1610.03295.pdf.
[10] EL SALLAB A,ABDOU M,PEROT E,et al. Deep reinforcement learning framework for autonomous driving[EB/OL].[2019-01-10]. https://arxiv.org/pdf/1704.02532.pdf.
[11] ITTI L,KOCH C. Computational modelling of visual attention[J]. Nature Reviews Neuroscience,2001,2(3):194-203.
[12] MNIH V,HEESS N,GRAVES A,et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2204-2212.
[13] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[14] LIANG J W,JIANG L,CAO L,et al. Focal visual-text attention for memex question answering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1893-1908.
[15] XU K,BA J L,KIROS R,et al. Show,attend and tell:neural image caption generation with visual attention[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1502.03044v3.pdf.
[16] LIN Z,FENG M W,DOS SANTOS C N,et al. A structured selfattentive sentence embedding[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1703.03130.pdf.
[17] UNDERWOOD G. Visual attention and the transition from novice to advanced driver[J]. Ergonomics,2007,50(8):1235-1249.
[18] 胡学敏, 易重辉, 陈钦, 等. 基于运动显著图的人群异常行为检测[J]. 计算机应用,2018,38(4):1164-1169.(HU X M,YI C H,CHEN Q,et al. Abnormal crowd behavior detection based on motion saliency map[J]. Journal of Computer Applications, 2018,38(4):1164-1169.)
[19] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015, 518(7540):529-533.
[20] WOJNA Z,GORBAN A N,LEE D S,et al. Attention-based extraction of structured information from street view imagery[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Piscataway:IEEE, 2017:844-850.
[21] 张盼盼, 李其申, 杨词慧. 基于轻量级分组注意力模块的图像分类算法[J]. 计算机应用,2020,40(3):645-650. (ZHANG P P,LI Q S,YANG C H. Image classification algorithm based on lightweight group-wise attention module[J]. Journal of Computer Applications,2020,40(3):645-650.)
[22] KINGMA D P,BA J L. Adam:a method for stochastic optimization[EB/OL].[2018-12-09]. https://arxiv.org/pdf/1412.6980.pdf.

基于深度视觉注意神经网络的端到端自动驾驶模型

End-to-end autonomous driving model based on deep visual attention neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[3]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[4]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[5]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[6]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[7]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[8]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[9]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[10]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[11]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[12]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[13]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[14]	武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896.
[15]	吴则举, 焦翠娟, 陈亮. 基于改进Faster R-CNN的轮胎缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1939-1946.