Motion planning for autonomous driving with directional navigation based on deep spatio-temporal Q-network

doi:10.11772/j.issn.1001-9081.2019101798

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (7): 1919-1925.DOI: 10.11772/j.issn.1001-9081.2019101798

• Artificial intelligence • Previous Articles Next Articles

Motion planning for autonomous driving with directional navigation based on deep spatio-temporal Q-network

HU Xuemin, CHENG Yu, CHEN Guowen, ZHANG Ruohan, TONG Xiuchi

School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China

Received:2019-10-24 Revised:2019-12-22 Online:2020-05-09 Published:2020-07-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61806076), the Natural Science Foundation of Hubei Province (2018CFB158), the Undergraduate Innovation and Enterpreneurship Training Plan of Hubei Province (S201910512026).

基于深度时空Q网络的定向导航自动驾驶运动规划

胡学敏, 成煜, 陈国文, 张若晗, 童秀迟

湖北大学计算机与信息工程学院, 武汉 430062

通讯作者: 胡学敏
作者简介:胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:机器学习、运动规划;成煜(1998-),男,湖北孝感人,主要研究方向:自动驾驶;陈国文(1999-),男,湖北宜昌人,主要研究方向:深度学习;张若晗(1997-),女,湖北襄阳人,硕士研究生,主要研究方向:机器学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习。
基金资助:
国家自然科学基金资助项目（61806076）；湖北省自然科学基金资助项目（2018CFB158）；湖北省大学生创新创业训练计划项目（S201910512026）。

Abstract

Abstract: To solve the problems of requiring a large number of samples, not associating with time information, and not using global navigation information in motion planning for autonomous driving based on machine learning, a motion planning method for autonomous driving with directional navigation based on deep spatio-temporal Q-network was proposed. Firstly, in order to extract the spatial features in images and the temporal information between continuous frames for autonomous driving, a new deep spatio-temporal Q-network was proposed based on the original deep Q-network and combined with the long short-term memory network. Then, to make full use of the global navigation information of autonomous driving, directional navigation was realized by adding the guide signal into the images for extracting environment information. Finally, based on the proposed deep spatio-temporal Q-network, a learning strategy oriented to autonomous driving motion planning model was designed to achieve the end-to-end motion planning, where the data of steering wheel angle, accelerator and brake were predicted from the input sequential images. The experimental results of training and testing results in the driving simulator named Carla show that in the four test roads, the average deviation of this algorithm is less than 0.7 m, and the stability performance of this algorithm is better than that of four comparison algorithms. It is proved that the proposed method has better learning performance, stability performance and real-time performance to realize the motion planning for autonomous driving with global navigation route.

Key words: autonomous driving, motion planning, reinforcement learning, deep spatio-temporal Q-network, directional navigation

摘要： 针对目前基于机器学习的自动驾驶运动规划需要大量样本、没有关联时间信息，以及没有利用全局导航信息等问题，提出一种基于深度时空Q网络的定向导航自动驾驶运动规划算法。首先，为提取自动驾驶的空间图像特征与前后帧的时间信息，基于原始深度Q网络，结合长短期记忆网络，提出一种新的深度时空Q网络；然后，为充分利用自动驾驶的全局导航信息，在提取环境信息的图像中加入指向信号来实现定向导航的目的；最后，基于提出的深度时空Q网络，设计面向自动驾驶运动规划模型的学习策略，实现端到端的运动规划，从输入的序列图像中预测车辆方向盘转角和油门刹车数据。在Carla驾驶模拟器中进行训练和测试的实验结果表明，在四条测试道路中该算法平均偏差均小于0.7 m，且稳定性能优于四种对比算法。该算法具有较好的学习性、稳定性和实时性，能够实现在全局导航路线下的自动驾驶运动规划。

关键词: 自动驾驶, 运动规划, 强化学习, 深度时空Q网络, 定向导航

CLC Number:

TP391.4

HU Xuemin, CHENG Yu, CHEN Guowen, ZHANG Ruohan, TONG Xiuchi. Motion planning for autonomous driving with directional navigation based on deep spatio-temporal Q-network[J]. Journal of Computer Applications, 2020, 40(7): 1919-1925.

胡学敏, 成煜, 陈国文, 张若晗, 童秀迟. 基于深度时空Q网络的定向导航自动驾驶运动规划[J]. 计算机应用, 2020, 40(7): 1919-1925.

References

[1] KARAMAN S,WALTER M R,PEREZ A,et al. Anytime motion planning using the RRT[C]//Proceedings of the 2011 IEEE International Conference on Robotics and Automation. Piscataway:IEEE,2011:1478-1483.
[2] BELL F. Connectivism:its place in theory-informed research and innovation in technology-enabled learning[J]. International Review of Research in Open and Distance Learning,2011,12(3):98-118.
[3] KOREN Y,BORENSTEIN J. Potential field methods and their inherent limitations for mobile robot navigation[C]//Proceedings of the 1991 IEEE International Conference on Robotics and Automation. Piscataway:IEEE,1991:1398-1404.
[4] ZHANG B,CHEN W,FEI M. An optimized method for path planning based on artificial potential field[C]//Proceedings of the 6th International Conference on Intelligent Systems Design and Applications. Piscataway:IEEE,2006:35-39.
[5] SCHAAL S. Is imitation learning the route to humanoid robots?[J]. Trends in Cognitive Sciences,1999,3(6):233-242.
[6] SUTTON R S,BARTO A G. Reinforcement Learning:An Introduction[M]. Cambridge:MIT Press,1998:1-4.
[7] CHEN C,SEFF A,KARNHAUSER A,et al. DeepDriving:learning affordance for direct perception in autonomous driving[C]//Proceedings of the 2015 International Conference on Computer Vision. Piscataway:IEEE,2015:2722-2730.
[8] BANSAL M,KRIZHEVRZ A,OGALE A. ChauffeurNet:learning to drive by imitating the best and synthesizing the worst[EB/OL].[2018-12-07]. https://arxiv.org/pdf/1812.03079.pdf.
[9] MNTH V,KAVUKCUOGLU K,SILVER D,et al. Playing Atari with deep reinforcement learning[EB/OL].[2018-12-12]. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf.
[10] 赵玉婷, 韩宝玲, 罗庆生. 基于Deep Q-network的双足机器人非平整地面行走稳定性控制方法[J]. 计算机应用, 2018, 38(9):2459-2463.(ZHAO Y T,HAN B L,LUO Q S. Walking stability control method based on deep Q-network for biped robot on uneven ground[J]. Journal of Computer Applications,2018,38(9):2459-2463.)
[11] LILLICRAP T P,HUNT J J,PRITZEL A,et al. Continuous control with deep reinforcement learning[EB/OL].[2019-02-22] https://arxiv.org/pdf/1509.02971v2.pdf.
[12] 张政, 何山, 贺靖淇. 基于长短时记忆单元和卷积神经网络混合神经网络模型的视频着色方法[J]. 计算机应用, 2019, 39(9):2726-2730.(ZHANG Z,HE S,HE J Q. Video colorization method based on hybrid neural network model of long short term memory and convolutional neural network[J]. Journal of Computer Applications,2019,39(9):2726-2730.)
[13] BOJARSKI M,DEL TESTA D,DWOAKOWSKI D,et al. End to end learning for self-driving cars[EB/OL].[2019-02-23]. http://arxiv.org/pdf/1604.07316.pdf.
[14] HOCHREFFITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
[15] WATKINS C J C H,DAYAN P. Q-learning[J]. Machine Learning,1992,8(3/4):279-292.
[16] ZHANG R,LIU C,CHEN Q. End-to-end control of kart agent with deep reinforcement learning[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics. Piscataway:IEEE,2018:1688-1693.
[17] DOSOVITSKIY A,ROS G,CODEVILLA F,et al. CARLA:an open urban driving simulator[EB/OL].[2018-11-10]. https://arxiv.org/pdf/1711.03938.pdf.
[18] 方红, 杨海蓉. 贪婪算法与压缩感知理论[J]. 自动化学报, 2011, 37(12):1413-1421.(FANG H,YANG H R. Greedy algorithms and compressed sensing[J]. Acta Automatica Sinica, 2011,37(12):1413-1421.)
[19] MNITH V,BADIA A P,MIRZ M,et al. Asynchronous methods for deep reinforcement learning[EB/OL].[2018-11-10]. https://arxiv.org/pdf/1602.01783.pdf.

Motion planning for autonomous driving with directional navigation based on deep spatio-temporal Q-network

基于深度时空Q网络的定向导航自动驾驶运动规划

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Hailin XIAO, Tianyi HUANG, Qiuxiang DAI, Yuejun ZHANG, Zhongshan ZHANG. Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction [J]. Journal of Computer Applications, 2024, 44(9): 2958-2963.
[2]	Haodong HE, Hao FU, Qiang WANG, Shuai ZHOU, Wei LIU. Multi-robot path following and formation based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(8): 2626-2633.
[3]	Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341.
[4]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[5]	Yaping DENG, Yingjiang LI. Review of YOLO algorithm and its applications to object detection in autonomous driving scenes [J]. Journal of Computer Applications, 2024, 44(6): 1949-1958.
[6]	Chao GE, Jiabin ZHANG, Lei WANG, Zhixin LUN. Trajectory planning for autonomous vehicles based on model predictive control [J]. Journal of Computer Applications, 2024, 44(6): 1959-1964.
[7]	Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510.
[8]	Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network [J]. Journal of Computer Applications, 2024, 44(5): 1562-1569.
[9]	Fatang CHEN, Miao HUANG, Yufeng JIN. Resource allocation algorithm for low earth orbit satellites oriented to user demand [J]. Journal of Computer Applications, 2024, 44(4): 1242-1247.
[10]	Xintong QIN, Zhengyu SONG, Tianwei HOU, Feiyue WANG, Xin SUN, Wei LI. Channel access and resource allocation algorithm for adaptive p-persistent mobile ad hoc network [J]. Journal of Computer Applications, 2024, 44(3): 863-868.
[11]	Yuanchao LI, Chongben TAO, Chen WANG. Gait control method based on maximum entropy deep reinforcement learning for biped robot [J]. Journal of Computer Applications, 2024, 44(2): 445-451.
[12]	Ziyang SONG, Junhuai LI, Huaijun WANG, Xin SU, Lei YU. Path planning algorithm of manipulator based on path imitation and SAC reinforcement learning [J]. Journal of Computer Applications, 2024, 44(2): 439-444.
[13]	Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG. Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism [J]. Journal of Computer Applications, 2024, 44(2): 432-438.
[14]	Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm [J]. Journal of Computer Applications, 2024, 44(11): 3629-3638.
[15]	Yu WANG, Zhihui GUAN, Yuanpeng LI. Distributed UAV cluster pursuit decision-making based on trajectory prediction and MADDPG [J]. Journal of Computer Applications, 2024, 44(11): 3623-3628.