计算机应用 ›› 2019, Vol. 39 ›› Issue (10): 2795-2801.DOI: 10.11772/j.issn.1001-9081.2019020341

• 人工智能 •    下一篇

基于鲁棒Restless Bandits模型的多水下自主航行器任务分配策略

李鑫滨, 章寿涛, 闫磊, 韩松   

  1. 工业计算机控制工程河北省重点实验室(燕山大学), 河北 秦皇岛 066004
  • 收稿日期:2019-03-04 修回日期:2019-05-18 出版日期:2019-10-10 发布日期:2019-07-03
  • 通讯作者: 李鑫滨
  • 作者简介:李鑫滨(1969-),男,北京人,教授,博士,主要研究方向:水声通信网络优化、多机器人控制与优化、水下目标追踪、机器学习;章寿涛(1994-),男,江西上饶人,硕士研究生,主要研究方向:机器学习、多机器人控制与优化;闫磊(1989-),男,河北秦皇岛人,博士研究生,主要研究方向:多机器人控制与优化、水下目标追踪;韩松(1989-),男,河北石家庄人,讲师,博士,主要研究方向:博弈论、多机器人控制与优化。
  • 基金资助:
    国家自然科学基金资助项目(61873224,61571387)。

Multiple autonomous underwater vehicle task allocation policy based on robust Restless Bandit model

LI Xinbin, ZHANG Shoutao, YAN Lei, HAN Song   

  1. Hebei Key Laboratory of Industrial Computer Control Engineering(Yanshan University), Qinhuangdao Hebei 066004, China
  • Received:2019-03-04 Revised:2019-05-18 Online:2019-10-10 Published:2019-07-03
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61873224, 61571387).

摘要: 针对水下监测网络中多自主航行器(AUV)协同信息采集任务分配问题进行了研究。首先,为了同时考虑系统中目标传感器的节点状态与声学信道状态对AUV任务分配问题的影响,构建了水声监测网络系统的综合模型;其次,针对水下存在的多未知干扰因素并考虑了模型产生不精确的情况,基于强化学习理论将多AUV任务分配系统建模为鲁棒无休止赌博机问题(RBP)。最后,提出鲁棒Whittle算法求解所建立的RBP,从而求解得出多AUV的任务分配策略。仿真结果表明,在干扰环境下与未考虑干扰因素的分配策略相比,在系统分别选择1、2、3个目标时,鲁棒AUV分配策略对应的系统累计回报值参数的性能分别提升了5.5%、12.3%和9.6%,验证了所提方法的有效性。

关键词: 水声监测网络, 水下自主航行器任务分配, 鲁棒控制, 不确定模型, 无休止赌博机问题

Abstract: The problem of multiple Autonomous Underwater Vehicles (AUV) collaborative task allocation for information acquisition in the underwater detection network was researched. Firstly, a comprehensive model of underwater acoustic monitoring network system was constructed considering the influence of network system sensor nodes status and communication channel status synthetically. Secondly, because of the multi-interference factors under water, with the inaccuracy of the model generation considered, and the multi-AUV task allocation system was modeled as a robust Restless Bandits Problem (RBP) based on the theory of reinforce learning. Lastly, the robust Whittle algorithm was proposed to solve the RBP problem to get the task allocation policy of multi-AUV. Simulation results show that when the system selected 1, 2 and 3 targets, the system cumulative return performance of the robust allocation policy improves by 5.5%, 12.3% and 9.6% respectively compared with that of the allocation strategy without interference factors considered, proving the effectiveness of the proposed approaches.

Key words: underwater acoustic monitoring network, Autonomous Underwater Vehicles (AUV) task allocation, robust control, inaccuracy model, Restless Bandit Problem (RBP)

中图分类号: