基于强化迭代学习的四旋翼无人机轨迹控制

doi:10.11772/j.issn.1001-9081.2021101814

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3950-3956.DOI: 10.11772/j.issn.1001-9081.2021101814

• 前沿与综合应用 • 上一篇

基于强化迭代学习的四旋翼无人机轨迹控制

刘旭光, 杜昌平(), 郑耀

浙江大学航空航天学院，杭州 310027

收稿日期:2021-10-26 修回日期:2021-12-15 接受日期:2021-12-23 发布日期:2022-01-05 出版日期:2022-12-10
通讯作者: 杜昌平
作者简介:刘旭光（1997—），男，河北石家庄人，硕士研究生，主要研究方向：导航制导与控制、复杂系统建模与仿真
郑耀（1963—），男，浙江玉环人，教授，博士，主要研究方向：飞行器设计、航空宇航推进理论与工程。

Trajectory control of quadrotor based on reinforcement learning-iterative learning

Xuguang LIU, Changping DU(), Yao ZHENG

School of Aeronautics and Astronautics，Zhejiang University，Hangzhou Zhejiang 310027，China

Received:2021-10-26 Revised:2021-12-15 Accepted:2021-12-23 Online:2022-01-05 Published:2022-12-10
Contact: Changping DU
About author:LIU Xuguang，born in 1997， M. S. candidate. His research interests include navigation， guidance and control， complex system modeling and simulation.
ZHENG Yao，born in 1963， Ph. D.， professor. His research interests include aircraft design， aerospace propulsion theory and engineering.

摘要/Abstract

摘要：

为进一步提升在未知环境下四旋翼无人机轨迹的跟踪精度，提出了一种在传统反馈控制架构上增加迭代学习前馈控制器的控制方法。针对迭代学习控制（ILC）中存在的学习参数整定困难的问题，提出了一种利用强化学习（RL）对迭代学习控制器的学习参数进行整定优化的方法。首先，利用RL对迭代学习控制器的学习参数进行优化，筛选出当前环境及任务下最优的学习参数以保证迭代学习控制器的控制效果最优；其次，利用迭代学习控制器的学习能力不断迭代优化前馈输入，直至实现完美跟踪；最后，在有随机噪声存在的仿真环境中把所提出的强化迭代学习控制（RL-ILC）算法与未经参数优化的ILC方法、滑模变结构控制（SMC）方法以及比例-积分-微分（PID）控制方法进行对比实验。实验结果表明，所提算法在经过2次迭代后，总误差缩减为初始误差的0.2%，实现了快速收敛；并且与SMC控制方法及PID控制方法相比，RL-ILC算法在算法收敛后不会受噪声影响产生轨迹波动。由此可见，所提算法能够有效提高无人机轨迹跟踪的准确性和鲁棒性。

关键词: 迭代学习控制, 强化学习, 四旋翼无人机, 参数整定, 轨迹跟踪

Abstract:

In order to further improve the trajectory tracking accuracy of quadrotor in unknown environment， a control method adding an iterative learning feedforward controller to the traditional feedback control architecture was proposed. Facing the difficulty of tuning learning parameters in the process of Iterative Learning Control （ILC）， a method of tuning and optimizing learning parameters of iterative learning controllers using Reinforcement Learning （RL） was proposed. Firstly， RL was used to optimize the learning parameters of iterative learning controller， and the optimal learning parameters under the current environment and tasks were filtered out to ensure the optimal control effect of the iterative learning controller. Then， with the learning ability of iterative learning controllers， the feedforward input was optimized iteratively until the perfect tracking was achieved. Finally， in the simulation environment with random noise， experiments were carried out to compare the proposed Reinforcement Learning-Iterative Learning Control （RL-ILC） algorithm with ILC method without optimizing parameters， Sliding Mode Control （SMC） method and Proportional-Integral-Derivative （PID） control method. Experimental results show that after two iterations， the proposed algorithm has the total error reduced to 0.2% of the initial error， achieving rapid convergence. Compared with SMC method and PID control method， RL-ILC algorithm is not affected by noise and does not produce trajectory fluctuations after algorithm convergence. The results illustrate that the proposed algorithm can effectively improve the trajectory tracking task’s accuracy and robustness.

Key words: Iterative Learning Control (ILC), reinforcement learning, quadrotor, parameter tuning, trajectory tracking

中图分类号:

TP273

刘旭光, 杜昌平, 郑耀. 基于强化迭代学习的四旋翼无人机轨迹控制[J]. 计算机应用, 2022, 42(12): 3950-3956.

Xuguang LIU, Changping DU, Yao ZHENG. Trajectory control of quadrotor based on reinforcement learning-iterative learning[J]. Journal of Computer Applications, 2022, 42(12): 3950-3956.

图/表 6

表1 四旋翼无人机的部分参数

Tab. 1 Some parameters of quadrotor

参数	值
$X$ 轴转动惯量 $J x x$	$1.745 × 10 - 2 k g ⋅ m 2$
$Y$ 轴转动惯量 $J y y$	$1.745 × 10 - 2 k g ⋅ m 2$
$Z$ 轴转动惯量 $J z z$	$3.175 × 10 - 2 k g ⋅ m 2$
机身半径 $d$	$0.225 m$
机身质量 $m$	$1.5 k g$
桨翼综合拉力参数 $C T$	$1.105 × 10 - 5 N · (r a d · s - 1) 2$
桨翼综合力矩参数 $C m$	$1.489 × 10 - 7 m$

表1 四旋翼无人机的部分参数

Tab. 1 Some parameters of quadrotor

参数	值
$X$ 轴转动惯量 $J x x$	$1.745 × 10 - 2 k g ⋅ m 2$
$Y$ 轴转动惯量 $J y y$	$1.745 × 10 - 2 k g ⋅ m 2$
$Z$ 轴转动惯量 $J z z$	$3.175 × 10 - 2 k g ⋅ m 2$
机身半径 $d$	$0.225 m$
机身质量 $m$	$1.5 k g$
桨翼综合拉力参数 $C T$	$1.105 × 10 - 5 N · (r a d · s - 1) 2$
桨翼综合力矩参数 $C m$	$1.489 × 10 - 7 m$

图1 无人机控制系统结构

Fig. 1 Structure of UAV control system

图2 迭代学习控制系统结构

Fig. 2 Structure of iterative learning control system

表2 输入变量初始参数

Tab. 2 Initial parameters of input variables

变量	初始值
$U 1$	10
$U 2$	0
$U 3$	0
$U 4$	0
$θ$	0
$θ ˙$	0
$γ$	0
$γ ˙$	0

表2 输入变量初始参数

Tab. 2 Initial parameters of input variables

变量	初始值
$U 1$	10
$U 2$	0
$U 3$	0
$U 4$	0
$θ$	0
$θ ˙$	0
$γ$	0
$γ ˙$	0

图3 强化学习参数寻优散点图

Fig. 3 Scatter diagram of reinforcement learning parameter optimization

图4 强化迭代学习与迭代学习的轨迹跟踪对比

Fig. 4 Trajectory tracking comparison between reinforcement learning-iterative learning and iterative learning

参考文献 19

1	ROSALES C， SORIA C M， ROSSOMANDO F G. Identification and adaptive PID control of a hexacopter UAV based on neural networks［J］. International Journal of Adaptive Control and Signal Processing， 2019， 33（1）： 74-91. 10.1002/acs.2955
2	NAJM A A， IBRAHEEM I K. Nonlinear PID controller design for a 6-DOF UAV quadrotor system［J］. Engineering Science and Technology， an International Journal， 2019， 22（4）： 1087-1097. 10.1016/j.jestch.2019.02.005
3	KUMAR R， DECHERING M， PAI A， et al. Differential flatness based hybrid PID/LQR flight controller for complex trajectory tracking in quadcopter UAVs［C］// Proceedings of the 2017 IEEE National Aerospace and Electronics Conference. Piscataway： IEEE， 2017： 113-118. 10.1109/naecon.2017.8268755
4	RABAH M， ROHAN A， HAN Y J， et al. Design of fuzzy-PID controller for quadcopter trajectory-tracking［J］. International Journal of Fuzzy Logic and Intelligent Systems， 2018， 18（3）： 204-213. 10.5391/ijfis.2018.18.3.204
5	LEE C H， JIN C J. A reinforcement learning method of PID parameters in UAV’s flight control［M］// LAM A D K T， PRIOR S D， SHEN S T， et al. Engineering Innovation and Design. London： CRC Press， 2019： 135-138. 10.1201/9780429019777-27
6	KOCH W， MANCUSO R， WEST R， et al. Reinforcement learning for UAV attitude control［J］. ACM Transactions on Cyber-Physical Systems， 2019， 3（2）： No.22. 10.1145/3301273
7	CHEN F Y， JIANG R Q， ZHANG K K， et al. Robust backstepping sliding-mode control and observer-based fault estimation for a quadrotor UAV［J］. IEEE Transactions on Industrial Electronics， 2016， 63（8）： 5044-5056.
8	REDROVAN D V， KIM D. Multiple quadrotors flight formation control based on sliding mode control and trajectory tracking［C］// Proceedings of the 2018 International Conference on Electronics， Information， and Communication. Piscataway： IEEE， 2018： 1-6. 10.23919/elinfocom.2018.8330657
9	ABRO G E M， ZULKIFLI S A B M， ASIRVADAM V S， et al. Model-free-based single-dimension fuzzy SMC design for underactuated quadrotor UAV［J］. Actuators， 2021， 10（8）： No.191. 10.3390/act10080191
10	PEROZZI G， EFIMOV D， BIANNIC J M， et al. Trajectory tracking for a quadrotor under wind perturbations： sliding mode control with state-dependent gains［J］. Journal of the Franklin Institute， 2018， 355（12）： 4809-4838. 10.1016/j.jfranklin.2018.04.042
11	梁晨，刘小雄，张兴旺，等. 基于强化学习的四旋翼无人机控制律设计［J］. 计算机测量与控制， 2021， 29（2）：71-75， 86. 10.16526/j.cnki.11-4762/tp.2021.02.016
	LIANG C， LIU X X， ZHANG X W， et al. Design of control law for quadrotor UAV based on reinforcement learning［J］. Computer Measurement and Control， 2021， 29（2）：71-75， 86. 10.16526/j.cnki.11-4762/tp.2021.02.016
12	刘小雄，梁晨，张兴旺，等.一种基于强化学习的四旋翼无人机轨迹控制方法： 202011536196.8［P］. 2021-04-13.
	LIU X X， LIANG C， ZHANG X W， et al. A trajectory control method for quadrotor UAV based on reinforcement learning： 202011536196.8［P］. 2021-04-13.
13	MA Z W， HU T J， SHEN L C， et al. An iterative learning controller for quadrotor UAV path following at a constant altitude［C］// Proceedings of the 34th Chinese Control Conference. Piscataway： IEEE， 2015： 4406-4411. 10.1109/chicc.2015.7260322
14	DONG J， HE B. Novel fuzzy PID-type iterative learning control for quadrotor UAV［J］. Sensors， 2019， 19（1）： No.24. 10.3390/s19010024
15	SHI D J， DAI X H， ZHANG X W， et al. A practical performance evaluation method for electric multicopters［J］. IEEE/ASME Transactions on Mechatronics， 2017， 22（3）：1337-1348. 10.1109/tmech.2017.2675913
16	方振平，陈万春，张曦光. 航空飞行器飞行动力学［M］. 北京：北京航空航天大学出版社， 2005：16-17.
	FANG Z P， CHEN W C， ZHANG X G. Flight Dynamics of Aircraft［M］. Beijing： Beihang University Press， 2005：16-17.
17	朱奕航，吴庆宪，王从庆，等. 基于最优迭代学习控制的四旋翼轨迹跟踪控制［J］. 电光与控制， 2020， 27（6）：37-42. 10.3969/j.issn.1671-637X.2020.06.008
	ZHU Y H， WU Q X， WANG C Q， et al. Trajectory tracking control of a quadrotor based on optimal iterative learning control［J］. Electronics Optics and Control， 2020， 27（6）：37-42. 10.3969/j.issn.1671-637X.2020.06.008
18	周志华. 机器学习［M］. 北京：清华大学出版社， 2016：382-383.
	ZHOU Z H. Machine Learning［M］. Beijing： Tsinghua University Press， 2016：382-383.
19	王思鹏，杜昌平，郑耀. 基于强化学习的扑翼飞行器路径规划算法［J］. 控制与决策， 2022， 37（4）：851-860.
	WANG S P， DU C P， ZHENG Y. Local planner for flapping wing micro aerial vehicle based on deep reinforcement learning［J］. Control and Decision， 2022， 37（4）：851-860.

[1]	罗飞, 白梦伟. 基于强化学习的交通情景问题决策优化[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2361-2368.
[2]	刘炎培, 陈宁宁, 朱运静, 王丽萍. 面向5G/Beyond 5G的移动边缘缓存优化技术综述[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2487-2500.
[3]	谭庆, 李辉, 吴昊霖, 王壮, 邓书超. 基于奖励预测误差的内在好奇心方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1822-1828.
[4]	邓世权, 叶绪国. 基于深度Q网络的多目标任务卸载算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1668-1674.
[5]	赵海妮, 焦健. 基于强化学习的渗透路径推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1689-1694.
[6]	邓绍斌, 朱军, 周晓锋, 李帅, 刘舒锐. 基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1642-1648.
[7]	刘延飞, 彭征, 王艺辉, 王忠. 基于改进的遗传算法的有刷直流电机PID参数整定[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1634-1641.
[8]	陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200.
[9]	李学明, 吴国豪, 周尚波, 林晓然, 谢洪斌. 基于分数阶网络和强化学习的图像实例分割模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 574-583.
[10]	孙洋洋, 姚俊萍, 李晓军, 范守祥, 王自维. 面向单记录的混合负载下物化视图异步增量维护任务生成[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3763-3768.
[11]	郭潇, 李春山, 张宇跃, 初佃辉. 基于自适应多目标强化学习的服务集成方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3500-3505.
[12]	臧嵘, 王莉, 史腾飞. 基于注意力消息共享的多智能体强化学习[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3346-3353.
[13]	石兵, 黄茜子, 宋兆翔, 徐建桥. 基于用户激励的共享单车调度策略[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3395-3403.
[14]	徐郁, 朱韵攸, 刘筱, 邓雨婷, 廖勇. 基于深度强化学习的电力物资配送多目标路径优化[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3252-3258.
[15]	曾柏森, 钟勇, 牛宪华. 基于因子分解机用于安全探索的Q表初始化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 209-214.

基于强化迭代学习的四旋翼无人机轨迹控制

Trajectory control of quadrotor based on reinforcement learning-iterative learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 19

相关文章 15

编辑推荐

Metrics