基于值函数迭代的持续监测无人机路径规划

doi:10.11772/j.issn.1001-9081.2022091464

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3290-3296.DOI: 10.11772/j.issn.1001-9081.2022091464

所属专题：前沿与综合应用

基于值函数迭代的持续监测无人机路径规划

刘晨¹^,², 陈洋¹^,²(), 符浩³

^1.武汉科技大学机器人与智能系统研究院，武汉 430081
^2.冶金自动化与检测技术教育部工程研究中心（武汉科技大学），武汉 430081
^3.武汉科技大学计算机科学与技术学院，武汉 430081

收稿日期:2022-09-30 修回日期:2023-01-13 接受日期:2023-01-15 发布日期:2023-03-02 出版日期:2023-10-10
通讯作者: 陈洋
作者简介:刘晨（1998—），男，湖北洪湖人，硕士研究生，主要研究方向：机器人导航与路径规划
符浩（1988—），男，湖南桃江人，讲师，博士，主要研究方向：多机器人强化学习。
基金资助:
国家自然科学基金资助项目(62173262)

UAV path planning for persistent monitoring based on value function iteration

Chen LIU¹^,², Yang CHEN¹^,²(), Hao FU³

^1.Institute of Robotics and Intelligent Systems，Wuhan University of Science and Technology，Wuhan Hubei 430081，China
^2.Engineering Research Center for Metallurgical Automation and Measurement Technology of Ministry of Education （Wuhan University of Science and Technology），Wuhan Hubei 430081，China
^3.School of Computer Science and Technology，Wuhan University of Science and Technology，Wuhan Hubei 430081，China

Received:2022-09-30 Revised:2023-01-13 Accepted:2023-01-15 Online:2023-03-02 Published:2023-10-10
Contact: Yang CHEN
About author:LIU Chen， born in 1998， M. S. candidate. His research interests include robot navigation and path planning.
FU Hao， born in 1988， Ph. D.， lecturer. His research interests include multi-robot reinforcement learning.
Supported by:
National Natural Science Foundation of China(62173262)

摘要/Abstract

摘要：

使用无人机（UAV）持续监测指定区域可以起到威慑入侵破坏、及时发现异常等作用，然而固定的监测规律容易被入侵者发现，因此需要设计UAV飞行路径的随机算法。针对以上问题，提出一种基于值函数迭代（VFI）的UAV持续监测路径规划算法。首先，合理选择监测目标点的状态，并分析各监测节点的剩余时间；其次，结合奖励/惩罚收益和路径安全性约束构建该监测目标点对应状态的值函数，在VFI算法过程中基于ε原则和轮盘选择随机选择下一节点；最后，以所有状态的值函数增长趋于饱和为目标，求解UAV持续监测路径。仿真实验结果表明，所提算法获得的信息熵为0.905 0，VFI运行时间为0.363 7 s，相较于传统蚁群算法（ACO），所提算法的信息熵提升了216%，运行时间降低了59%，随机性与快速性均有所提升，验证了具有随机性的UAV飞行路径对提高持续监测效率具有重要意义。

关键词: 路径规划, 持续监测, 值迭代, 轮盘选择, ε原则

Abstract:

The use of Unmanned Aerial Vehicle （UAV） to continuously monitor designated areas can play a role in deterring invasion and damage as well as discovering abnormalities in time， but the fixed monitoring rules are easy to be discovered by the invaders. Therefore， it is necessary to design a random algorithm for UAV flight path. In view of the above problem， a UAV persistent monitoring path planning algorithm based on Value Function Iteration （VFI） was proposed. Firstly， the state of the monitoring target point was selected reasonably， and the remaining time of each monitoring node was analyzed. Secondly， the value function of the corresponding state of this monitoring target point was constructed by combining the reward/penalty benefit and the path security constraint. In the process of the VFI algorithm， the next node was selected randomly based on ε principle and roulette selection. Finally， with the goal that the growth of the value function of all states tends to be saturated， the UAV persistent monitoring path was solved. Simulation results show that the proposed algorithm has the obtained information entropy of 0.905 0， and the VFI running time of 0.363 7 s. Compared with the traditional Ant Colony Optimization （ACO）， the proposed algorithm has the information entropy increased by 216%， and the running time decreased by 59%，both randomness and rapidity have been improved. It is verified that random UAV flight path is of great significance to improve the efficiency of persistent monitoring.

Key words: path planning, persistent monitoring, value iteration, roulette selection, ε principle

中图分类号:

TP242

刘晨, 陈洋, 符浩. 基于值函数迭代的持续监测无人机路径规划[J]. 计算机应用, 2023, 43(10): 3290-3296.

Chen LIU, Yang CHEN, Hao FU. UAV path planning for persistent monitoring based on value function iteration[J]. Journal of Computer Applications, 2023, 43(10): 3290-3296.

图/表 13

表1 仿真参数

Tab. 1 Simulation parameters

参数	取值	参数	取值
M	10.0	θ	0.5
α	0.2	K	10.0
γ	0.3	σ	0.1

图1 持续监测路网

Fig. 1 Road network of persistent monitoring

表2 各节点位置及其最大允许监测周期

Tab. 2 Position and maximum allowable monitoring period of each node

节点	坐标	最大允许监测周期/s	节点	坐标	最大允许监测周期/s
1	（5，5）	25.0	6	（15，10）	23.0
2	（10，5）	26.0	7	（5，15）	22.0
3	（15，5）	27.0	8	（15，15）	29.0
4	（5，10）	28.0	9	（20，15）	21.0
5	（10，10）	24.0	10	（20，5）	25.5

图2 收敛阈值

Fig. 2 Convergence threshold

图3 基于VFI的持续监测路径

Fig. 3 Persistent monitoring path based on VFI

图4 初始点为1和10时概率矩阵灰度图

Fig. 4 Grayscale image of stochastic matrix with an initial point of 1 and 10

图5 三条路径的收敛曲线

Fig. 5 Convergence curves for three paths

表3 持续监测路径对比

Tab. 3 Persistent monitoring path comparison

路径

重复

状态数

监测

频率/Hz

路径

重复

状态数

监测

频率/Hz

图6 三条优化的监测路径

Fig. 6 Three optimized monitoring paths

表4 不同持续监测算法的路径结果对比

Tab. 4 Comparison of path results of different persistent monitoring algorithms

路径	信息熵	运行时间/s	路径	信息熵	运行时间/s
原始路径	0.483 2	0.263 3	VFI-1	0.866 6	0.285 5
GA	0.000 0	0.001 5	VFI-2	0.905 0	0.363 7
ACO	0.286 0	0.882 9	VFI-3	1.483 1	0.401 2
SA	0.000 0	4.124 0

图7 部分城区地图及对应的实际路网

Fig. 7 Map of part urban area and corresponding actual road network

图8 实际路网下监测路径的收敛曲线

Fig. 8 Convergence curve of monitoring path in actual road network

图9 实际路网下的最优持续监测路径

Fig. 9 Optimal persistent monitoring path in actual road network

参考文献 24

1	CANNATA G， SGORBISSA A. A minimalist algorithm for multirobot continuous coverage［J］. IEEE Transactions on Robotics， 2011， 27（2）： 297-312. 10.1109/tro.2011.2104510
2	PORTUGAL D， ROCHA R P. Multi-robot patrolling algorithms： examining performance and scalability［J］. Advanced Robotics， 2013， 27（5）： 325-336. 10.1080/01691864.2013.763722
3	MACHADO A， RAMALHO G， ZUCKER J D， et al. Multi-agent patrolling： an empirical analysis of alternative architectures［C］// Proceedings of the 2002 International Workshop on Multi-Agent Systems and Agent-Based Simulation， LNCS 2581. Berlin： Springer， 2003： 155-170.
4	PASQUALETTI F， FRANCHI A， BULLO F. On cooperative patrolling： optimal trajectories， complexity analysis， and approximation algorithms［J］. IEEE Transactions on Robotics， 2012， 28（3）： 592-606. 10.1109/tro.2011.2179580
5	ELMALIACH Y， AGMON N， KAMINKA G A. Multi-robot area patrol under frequency constraints［J］. Annals of Mathematics and Artificial Intelligence， 2009， 57（3/4）： 293-320. 10.1007/s10472-010-9193-y
6	CHEN Y， SHU Y， HU M， et al. Multi-UAV cooperative path planning with monitoring privacy preservation［J］. Applied Sciences， 2022， 12（23）： No.12111. 10.3390/app122312111
7	ZHANG H， ZHAO J， WANG R， et al. Multi-objective reinforcement learning algorithm and its application in drive system［C］// Proceedings of the 34th Annual Conference of IEEE Industrial Electronics. Piscataway： IEEE， 2008： 274-279. 10.1109/iecon.2008.4757965
8	OH J， GUO X， LEE H， et al. Action-conditional video prediction using deep networks in Atari games［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 2. Cambridge： MIT Press， 2015： 2863-2871.
9	CAICEDO J C， LAZEBNIK S. Active object localization with deep reinforcement learning［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 2488-2496. 10.1109/iccv.2015.286
10	LEWIS M， YARATS D， DAUPHIN Y， et al. Deal or no deal？ end-to-end learning of negotiation dialogues［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2017： 2443-2453. 10.18653/v1/d17-1259
11	WEISZ G， BUDZIANOWSKI P， SU P H， et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（11）： 2083-2097. 10.1109/taslp.2018.2851664
12	DERHAMI V， PAKSIMA J， KHAJAH H. Web pages ranking algorithm based on reinforcement learning and user feedback［J］. Journal of AI and Data Mining， 2015， 3（2）： 157-168. 10.5829/idosi.jaidm.2015.03.02.05
13	BELLMAN R. On the theory of dynamic programming［J］. Proceedings of the National Academy of Sciences of the United States of America， 1952， 38（8）： 716-719. 10.1073/pnas.38.8.716
14	BRAVO R Z B， LEIRAS A， CYRINO OLIVEIRA F L. The use of UAV s in humanitarian relief： an application of POMDP-based methodology for finding victims［J］. Production and Operations Management， 2019， 28（2）： 421-440. 10.1111/poms.12930
15	BURKS L， AHMED N， LOEFGREN I， et al. Collaborative human-autonomy semantic sensing through structured POMDP planning［J］. Robotics and Autonomous Systems， 2021， 140： No.103753. 10.1016/j.robot.2021.103753
16	AKBARINASAJI S， KAVAKLIOGLU C， BAŞAR A， et al. Partially observable Markov decision process to generate policies in software defect management［J］. Journal of Systems and Software， 2020， 163： No.110518. 10.1016/j.jss.2020.110518
17	HORÁK K， BOŠANSKÝ B， PÉCHOUČEK M. Heuristic search value iteration for one-sided partially observable stochastic games［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017：558-564. 10.1609/aaai.v31i1.10597
18	LIU F， HUA X， JIN X. A hybrid heuristic value iteration algorithm for POMDP［C］// Proceedings of the IEEE 28th International Conference on Tools with Artificial Intelligence. Piscataway： IEEE， 2016： 304-310. 10.1109/ictai.2016.0054
19	房俊恒. 基于点的值迭代算法在POMDP问题中的研究［D］. 苏州：苏州大学， 2015： 25-35.
	FANG J H. Research on point-based value iteration algorithms in POMDP domains［D］. Suzhou： Soochow University， 2015： 25-35.
20	WASHINGTON P H， SCHWAGER M. Reduced state value iteration for multi-drone persistent surveillance with charging constraints［C］// Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2021： 6390-6397. 10.1109/iros51168.2021.9636160
21	BETHKE B， BERTUCCELLI L， HOW J P. Experimental demonstration of adaptive MDP-based planning with model uncertainty［C］// Proceedings of the 2008 AIAA Guidance， Navigation and Control Conference and Exhibit. Reston， VA： AIAA， 2008： No.6322. 10.2514/6.2008-6322
22	JEONG B M， HA J S， CHOI H L. MDP-based mission planning for multi-UAV persistent surveillance［C］// Proceedings of the 14th International Conference on Control， Automation and Systems. Piscataway： IEEE， 2014： 831-834. 10.1109/iccas.2014.6987894
23	陈佳，游晓明，刘升，等. 结合信息熵的多种群博弈蚁群算法［J］. 计算机工程与应用， 2019， 55（16）：170-178.
	CHEN J， YOU X M， LIU S， et al. Entropy-game based multi-population ant colony optimization［J］. Computer Engineering and Applications， 2019， 55（16）：170-178.
24	HA M， WANG D， LIU D. Generalized value iteration for discounted optimal control with stability analysis［J］. Systems and Control Letters， 2021， 147： No.104847. 10.1016/j.sysconle.2020.104847

[1]	田润泽, 周宇龙, 朱洪, 薛岗. 基于局部信息的服务迁移路径选择算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2168-2174.
[2]	马天, 席润韬, 吕佳豪, 曾奕杰, 杨嘉怡, 张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2055-2064.
[3]	李建强, 何舟. 面向多行程取送货车辆路径问题的混合NSGA-Ⅱ[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1187-1194.
[4]	黄海新, 于广威, 程寿山, 李春明. 基于改进灰狼优化的桥梁检测爬壁机器人全覆盖路径规划[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 966-971.
[5]	宋紫阳, 李军怀, 王怀军, 苏鑫, 于蕾. 基于路径模仿和SAC强化学习的机械臂路径规划算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 439-444.
[6]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[7]	邓辅秦, 谭朝恩, 黎俊炜, 钟家铭, 付兰慧, 张建民, 王宏民, 李楠楠, 姜炳春, 林天麟. 面向大型仓储环境的基于冲突搜索算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3854-3860.
[8]	孙鉴, 马宝全, 吴隹伟, 杨晓焕, 武涛, 陈攀. 地震场景下无人机群路径规划与任务分配均衡联合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3232-3239.
[9]	李永迪, 李彩虹, 张耀玉, 张国胜. 基于改进SAC算法的移动机器人路径规划[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 654-660.
[10]	黄霖, 符强, 童楠. 基于自适应调整哈里斯鹰优化算法求解机器人路径规划问题[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3840-3847.
[11]	王龙宝, 栾茵琪, 徐亮, 曾昕, 张帅, 徐淑芳. 基于动态簇粒子群优化的无人机集群路径规划方法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3816-3823.
[12]	范厚明, 牟爽, 岳丽君. 考虑冲突和拥堵的自动导引车调度与路径规划协同优化[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2281-2291.
[13]	陈昇, 周隽, 胡小兵, 马霁. 基于混合模拟退火算法的机场进场程序优化[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 606-615.
[14]	李开荣, 刘爽, 胡倩倩, 唐亦媛. 基于转角约束的改进蚁群优化算法路径规划[J]. 计算机应用, 2021, 41(9): 2560-2568.
[15]	黄书召, 田军委, 乔路, 王沁, 苏宇. 基于改进遗传算法的无人机路径规划[J]. 计算机应用, 2021, 41(2): 390-397.

基于值函数迭代的持续监测无人机路径规划

UAV path planning for persistent monitoring based on value function iteration

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 24

相关文章 15

编辑推荐

Metrics