计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2283-2288.

• 人工智能 • 上一篇    下一篇

进化操作行为学习模型及在移动机器人避障上的应用

郜园园,朱凡,宋洪军   

  1. 浙江农林大学 信息工程学院,杭州 311300
  • 收稿日期:2013-02-26 修回日期:2013-05-07 出版日期:2013-08-01 发布日期:2013-09-11
  • 通讯作者: 郜园园
  • 作者简介:郜园园(1984-),女,河南安阳人,讲师,博士,主要研究方向:移动机器人控制、机器学习、智能控制;
    朱凡 (1979-),女,河南南阳人,讲师,博士,主要研究方向:生物医学信息处理;
    宋洪军(1981-),男,山东泰安人,博士研究生,主要研究方向:智能交通、机器视觉。
  • 基金资助:
    浙江省青年科学基金资助项目;浙江农林大学人才启动项目

Evolutionary operant behavior learning model and its application to mobile robot obstacle avoidance

GAO Yuanyuan,ZHU Fan,SONF Hongjun   

  1. School of Information Engineering, Zhejiang Agriculture and Forestry University, Hangzhou Zhejiang 311300, China
  • Received:2013-02-26 Revised:2013-05-07 Online:2013-09-11 Published:2013-08-01
  • Contact: GAO Yuanyuan

摘要: 针对移动机器人避障上存在的自适应能力较差的问题,结合遗传算法(GA)的进化思想,以自适应启发评价(AHC)学习和操作条件反射(OC)理论为基础,提出了一种基于进化操作行为学习模型(EOBLM)的移动机器人学习避障行为的方法。该方法是一种改进的AHC学习模式,评价单元采用多层前向神经网络来实现,利用TD算法和梯度下降法进行权值更新,这一阶段学习用来生成取向性信息,作为内在动机决定进化的方向;动作选择单元主要用来优化操作行为以实现状态到动作的最佳映射。优化过程分两个阶段来完成,第一阶段通过操作条件反射学习算法得到的信息熵作为个体适应度,执行GA学习算法搜索最优个体;第二阶段由OC学习算法选择最优个体内的最优操作行为,并得到新的信息熵值。通过移动机器人避障仿真实验,结果表明所设计的EOBLM能使机器人通过不断与外界未知环境进行交互主动学会避障的能力,与传统的AHC方法相比其自学习自适应的能力得到加强。

关键词: 移动机器人, 自适应启发评价, 操作条件反射, 遗传算法, 避障

Abstract: To solve the problem of poor self-adaptive ability in the robot obstacle avoidance, combined with evolution thought of Genetic Algorithm (GA), an Evolutionary Operant Behavior Learning Model (EOBLM) was proposed for the mobile robot learning obstacle avoidance in unknown environment, which was based on Operant Conditioning (OC) and Adaptive Heuristic Critic (AHC) learning. The proposed model was a modified version of the AHC learning architecture. Adaptive Critic Element (ACE) network was composed of a multi-layer feedforward network and the learning was enhanced by TD(λ) algorithm and gradient descent algorithm. A tropism mechanism was designed in this stage as intrinsic motivation and it could direct the orientation of the Agent learning. Adaptive Selection Element (ASE) network was used to optimize operant behavior to achieve the best mapping from state to actor. The optimizing process has two stages. At the first stage, the information entropy got by OC learning algorithm was used as individual fitness to search the optimal individual with executing the GA learning. At the second stage, the OC learning selected the optimal operation behavior within the optimal individual and got new information entropy. The results of experiments on obstacle avoidance show that the method endows the mobile robot with the capabilities of learning obstacle avoidance actively for path planning through interaction with the environment constantly. The results were compared with the traditional AHC learning algorithm, and the proposed model had better performance on self-learning and self-adaptive abilities.

Key words: mobile robot, Adaptive Heuristic Critic (AHC), operant conditioning, Genetic Algorithm (GA), obstacle avoidance

中图分类号: