[1] HART P E,NILSSON N J,RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics,1968,4(2):100-107. [2] YU Z,YU X,KOUDAS N,et al. Distributed processing of k shortest path queries over dynamic road networks[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM,2020:665-679. [3] GLASIUS R,KOMODA A,GIELEN S C A M. Neural network dynamics for path planning and obstacle avoidance[J]. Neural Networks,1995,8(1):125-133. [4] 孟宪权, 赵英男, 薛青. 遗传算法在路径规划中的应用[J]. 计算机工程,2008,34(16):215-217,220.(MENG X Q,ZHAO Y N, XUE Q. Application of genetic algorithm in path planning[J]. Computer Engineering,2008,34(16):215-217,220.) [5] SUTTON R S, BARTO A G. Reinforcement Learning:An Introduction[M]. Cambridge:MIT Press, 2017:140-141, 167-172. [6] LAMINI C,FATHI Y,BENHLIMA S. H-MAS architecture and reinforcement learning method for autonomous robot path planning[C]//Proceedings of the 2017 Intelligent Systems and Computer Vision. Piscataway:IEEE,2017:1-7. [7] 乔俊飞, 侯占军, 阮晓钢. 基于神经网络的强化学习在避障中的应用[J]. 清华大学学报(自然科学版),2008,48(S2):1747-1750.(QIAO J F,HOU Z J,RUAN X G. Application of neural network-based reinforcement learning applied to obstacle avoidance[J]. Journal of Tsinghua University(Natural Sciences Edition), 2008,48(S2):1747-1750.) [8] WANG Y H,LI T H S,LIN C J. Backward Q-learning:the combination of Sarsa algorithm and Q-learning[J]. Engineering Applications of Artificial Intelligence,2013,26(9):2184-2193. [9] GOSAVI A. Reinforcement learning:a tutorial survey and recent advances[J]. INFORMS Journal on Computing,2009,21(2):178-192. [10] WANG S. A method of path planning for mobile robot in dynamic and unknown environment[J]. Electrical Engineering,2010,45(2):36-42. [11] 王军红, 江虹, 黄玉清, 等. 基于RPkNN-Sarsa (λ)强化学习的机器人路径规划方法[J]. 计算机应用研究,2013,30(1):199-201.(WANG J H,JIANG H,HUANG Y Q,et al. Method of RPkNN-Sarsa(λ)reinforcement learning for robot path planning[J]. Application Research of Computers, 2013, 30(1):199-201.) [12] ANDRECUT M,ALI M K. Deep-Sarsa:a reinforcement learning algorithm for autonomous navigation[J]. International Journal of Modern Physics C,2001,12(10):1513-1523. [13] 林联明, 王浩, 王一雄. 基于神经网络的Sarsa强化学习算法[J]. 计算机技术与发展,2006,30(1):30-32.(LIN L M, WANG H,WANG Y X. Sarsa reinforcement learning algorithm based on neural networks[J]. Computer Technology and Development,2006,30(1):30-32.) [14] VIET H H,AN S H,CHUNG T C. Dyna-Q-based vector direction for path planning problem of autonomous mobile robots in unknown environments[J]. Advanced Robotics,2013,27(3):159-173. [15] 朱美强. 基于谱图理论的强化学习研究[D]. 徐州:中国矿业大学,2012:55-86.(ZHU M Q. Reinforcement learning based on spectral graph theory[D]. Xuzhou:China University of Mining and Technology,2012:55-86.) [16] 史豪斌, 徐梦, 刘珈妤, 等. 一种基于Dyna-Q学习的旋翼无人机视觉伺服智能控制方法[J]. 控制与决策,2019,34(12):2517-2526.(SHI H B,XU M,LIU J Y,et a1. A visual servo intelligent control method for rotor UAV based on Dyna-Q learning[J]. Control and Decision,2019,34(12):2517-2526.) [17] 余伶俐, 魏亚东, 霍淑欣. 基于MCPDDPG的智能车辆路径规划方法及应用[J/OL]. 控制与决策[2019-10-09]. https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.13195/j.kzyjc.2019.0460. (YU L L, WEI Y D,HUO S X. The method and application of intelligent vehicle path planning based on MCPDDPG[J/OL]. Control and Decision[2019-10-09]. https://kns.cnki.net/kcms/detail/detail.aspx?doi=10.13195/j.kzyjc.2019.0460.) [18] 黄颖, 余玉琴. 一种基于稠密卷积网络和竞争架构的改进路径规划算法[J]. 计算机与数字工程,2019,47(4):812-819. (HUANG Y,YU Y Q. An improved path planning algorithm based on densely connected convolutional network and dueling network architecture[J]. Computer and Digital Engineering, 2019,47(4):812-819.) [19] 刘涛, 王淑灵, 詹乃军. 多机器人路径规划的安全性验证[J]. 软件学报,2017,28(5):1118-1127.(LIU T,WANG S L, ZHAN N J. Safety verification of trajectory planning for multiple robots[J]. Journal of Software,2017,28(5):1118-1127.) [20] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件,2018,35(10):185-188,244.(ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software,2018,35(10):185-188,244.) [21] 解易, 顾益军. 基于Stackelberg策略的多agent强化学习警力巡逻路径规划[J]. 北京理工大学学报,2017,37(1):93-99. (XIE Y,GU Y J. Police patrol path planning using Stackelberg equilibrium based multiagent reinforcement learning[J]. Transactions of Beijing Institute of Technology,2017,37(1):93-99.) [22] 董培方, 张志安, 梅新虎, 等. 引入势场及陷阱搜索的强化学习路径规划算法[J]. 计算机工程与应用,2018,54(16):129-134.(DONG P F,ZHANG Z A,MEI X H,et al. Reinforcement learning path planning algorithm based on gravitational potential field and trap search[J]. Computer Engineering and Applications, 2018,54(16):129-134.) |