计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1926-1931.DOI: 10.11772/j.issn.1001-9081.2019112054

• 人工智能 • 上一篇    下一篇

基于深度视觉注意神经网络的端到端自动驾驶模型

胡学敏, 童秀迟, 郭琳, 张若晗, 孔力   

  1. 湖北大学 计算机与信息工程学院, 武汉 430062
  • 收稿日期:2019-12-04 修回日期:2020-03-27 出版日期:2020-07-10 发布日期:2020-06-29
  • 通讯作者: 郭琳
  • 作者简介:胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:计算机视觉、机器学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习;郭琳(1978-),女,湖北随州人,副教授,博士,主要研究方向:图像处理、机器学习;张若晗(1997-),女,湖北襄阳人,硕士研究生,主要研究方向:深度学习;孔力(1995-),男,湖北咸宁人,硕士研究生,主要研究方向:计算机视觉。
  • 基金资助:
    国家自然科学基金青年基金资助项目(61806076);湖北省自然科学基金青年项目(2018CFB158)

End-to-end autonomous driving model based on deep visual attention neural network

HU Xuemin, TONG Xiuchi, GUO Lin, ZHANG Ruohan, KONG Li   

  1. School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China
  • Received:2019-12-04 Revised:2020-03-27 Online:2020-07-10 Published:2020-06-29
  • Supported by:
    This work is partially supported by the Youth Program of National Natural Science Foundation of China (61806076), the Youth Program of the Hubei Provincial Natural Science Foundation (2018CFB158).

摘要: 针对现有端到端自动驾驶方法中存在的驾驶指令预测不准确、模型结构体量大和信息冗余多等问题,提出一种新的基于深度视觉注意神经网络的端到端自动驾驶模型。为了更有效地提取自动驾驶场景的特征,在端到端自动驾驶模型中引入视觉注意力机制,将卷积神经网络、视觉注意层和长短期记忆网络进行融合,提出一种深度视觉注意神经网络。该网络模型能够有效提取驾驶场景图像的空间特征和时间特征,并关注重要信息且减少信息冗余,实现用前向摄像机输入的序列图像来预测驾驶指令的端到端自动驾驶。利用模拟驾驶环境的数据进行训练和测试,该模型在乡村路、高速路、隧道和山路四个场景中对方向盘转向角预测的均方根误差分别为0.009 14、0.009 48、0.002 89和0.010 78,均低于对比用的英伟达公司提出的方法和基于深度级联神经网络的方法;并且与未使用视觉注意力机制的网络相比,该模型具有更少的网络层数。

关键词: 自动驾驶, 端到端, 视觉注意力, 卷积神经网络, 长短期记忆网络

Abstract: Aiming at the problems of low accuracy of driving command prediction, bulky model structure and a large amount of information redundancy in existing end-to-end autonomous driving methods, a new end-to-end autonomous driving model based on deep visual attention neural network was proposed. In order to effectively extract features of autonomous driving scenes, a deep visual attention neural network, which is composed of the convolutional neural network, the visual attention layer and the long short-term memory network, was proposed by introducing a visual attention mechanism into the end-to-end autonomous driving model. The proposed model was able to effectively extract spatial and temporal features of driving scene images, focus on important information and reduce information redundancy for realizing the end-to-end autonomous driving that predicts driving commands from sequential images input by front-facing camera. The data from a simulated driving environment were used for training and testing. The root mean square errors of the proposed model for prediction of the steering angle in four scenes including country road, highway, tunnel and mountain road are 0.009 14, 0.009 48, 0.002 89 and 0.010 78 respectively, which are all lower than the results of the method proposed by NVIDIA and the method based on the deep cascaded neural network. Moreover, the proposed model has fewer network layers compared with the networks without the visual attention mechanism.

Key words: autonomous driving, end-to-end, visual attention, convolutional neural network, long short-term memory network

中图分类号: