《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3950-3956.DOI: 10.11772/j.issn.1001-9081.2021101814

• 前沿与综合应用 • 上一篇    

基于强化迭代学习的四旋翼无人机轨迹控制

刘旭光, 杜昌平(), 郑耀   

  1. 浙江大学 航空航天学院,杭州 310027
  • 收稿日期:2021-10-26 修回日期:2021-12-15 接受日期:2021-12-23 发布日期:2022-01-05 出版日期:2022-12-10
  • 通讯作者: 杜昌平
  • 作者简介:刘旭光(1997—),男,河北石家庄人,硕士研究生,主要研究方向:导航制导与控制、复杂系统建模与仿真
    郑耀(1963—),男,浙江玉环人,教授,博士,主要研究方向:飞行器设计、航空宇航推进理论与工程。

Trajectory control of quadrotor based on reinforcement learning-iterative learning

Xuguang LIU, Changping DU(), Yao ZHENG   

  1. School of Aeronautics and Astronautics,Zhejiang University,Hangzhou Zhejiang 310027,China
  • Received:2021-10-26 Revised:2021-12-15 Accepted:2021-12-23 Online:2022-01-05 Published:2022-12-10
  • Contact: Changping DU
  • About author:LIU Xuguang,born in 1997, M. S. candidate. His research interests include navigation, guidance and control, complex system modeling and simulation.
    ZHENG Yao,born in 1963, Ph. D., professor. His research interests include aircraft design, aerospace propulsion theory and engineering.

摘要:

为进一步提升在未知环境下四旋翼无人机轨迹的跟踪精度,提出了一种在传统反馈控制架构上增加迭代学习前馈控制器的控制方法。针对迭代学习控制(ILC)中存在的学习参数整定困难的问题,提出了一种利用强化学习(RL)对迭代学习控制器的学习参数进行整定优化的方法。首先,利用RL对迭代学习控制器的学习参数进行优化,筛选出当前环境及任务下最优的学习参数以保证迭代学习控制器的控制效果最优;其次,利用迭代学习控制器的学习能力不断迭代优化前馈输入,直至实现完美跟踪;最后,在有随机噪声存在的仿真环境中把所提出的强化迭代学习控制(RL-ILC)算法与未经参数优化的ILC方法、滑模变结构控制(SMC)方法以及比例-积分-微分(PID)控制方法进行对比实验。实验结果表明,所提算法在经过2次迭代后,总误差缩减为初始误差的0.2%,实现了快速收敛;并且与SMC控制方法及PID控制方法相比,RL-ILC算法在算法收敛后不会受噪声影响产生轨迹波动。由此可见,所提算法能够有效提高无人机轨迹跟踪的准确性和鲁棒性。

关键词: 迭代学习控制, 强化学习, 四旋翼无人机, 参数整定, 轨迹跟踪

Abstract:

In order to further improve the trajectory tracking accuracy of quadrotor in unknown environment, a control method adding an iterative learning feedforward controller to the traditional feedback control architecture was proposed. Facing the difficulty of tuning learning parameters in the process of Iterative Learning Control (ILC), a method of tuning and optimizing learning parameters of iterative learning controllers using Reinforcement Learning (RL) was proposed. Firstly, RL was used to optimize the learning parameters of iterative learning controller, and the optimal learning parameters under the current environment and tasks were filtered out to ensure the optimal control effect of the iterative learning controller. Then, with the learning ability of iterative learning controllers, the feedforward input was optimized iteratively until the perfect tracking was achieved. Finally, in the simulation environment with random noise, experiments were carried out to compare the proposed Reinforcement Learning-Iterative Learning Control (RL-ILC) algorithm with ILC method without optimizing parameters, Sliding Mode Control (SMC) method and Proportional-Integral-Derivative (PID) control method. Experimental results show that after two iterations, the proposed algorithm has the total error reduced to 0.2% of the initial error, achieving rapid convergence. Compared with SMC method and PID control method, RL-ILC algorithm is not affected by noise and does not produce trajectory fluctuations after algorithm convergence. The results illustrate that the proposed algorithm can effectively improve the trajectory tracking task’s accuracy and robustness.

Key words: Iterative Learning Control (ILC), reinforcement learning, quadrotor, parameter tuning, trajectory tracking

中图分类号: