Real-time reconstruction method of visual information for manipulator operation

Qingyu JIA1, Liang CHANG1, Xianyi YANG2(), Baohua QIANG2, Shihao ZHANG1, Wu XIE2, Minghao YANG2   

  1. 1.School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin Guangxi 541004,China
    2.Guangxi Key Laboratory of Image and Graphic Intelligent Processing (Guilin University of Electronic Technology),Guilin Guangxi 541004,China
  • Received:2022-03-08 Revised:2022-05-24 Accepted:2022-05-26 Online:2022-08-16 Published:2023-04-10
  • Contact: Xianyi YANG
  • About author:JIA Qingyu, born in 1995, M. S. candidate. Her research interests include machine learning, artificial intelligence.
    CHANG Liang, born in 1980, Ph. D., professor. His research interests include data and knowledge engineering, formal methods, trusted software.
    QIANG Baohua, born in 1972, Ph. D., professor. His research interests include big data analysis, image processing.
    ZHANG Shihao, born in 1991, Ph. D. candidate. His research interests include human skeleton key point detection, image processing.
    XIE Wu, born in 1979, Ph. D., associate professor. His research interests include data mining, information processing.
    YANG Minghao, born in 1977, Ph. D., associate research fellow. His research interests include multimodal information fusion, man-machine cooperation.
  • Supported by:
    Natural Science Foundation of Guangxi(2019GXNSFDA185006);Guangxi Science and Technology Base and Talent Project(Guike AD19110137)


贾清玉1, 常亮1, 杨先一2(), 强保华2, 张世豪1, 谢武2, 杨明浩2   

  1. 1.桂林电子科技大学 计算机与信息安全学院, 广西 桂林 541004
    2.广西图像图形与智能处理重点实验室(桂林电子科技大学), 广西 桂林 541004
  • 通讯作者: 杨先一
  • 作者简介:贾清玉(1995—),女,山西大同人,硕士研究生,主要研究方向:机器学习、人工智能;
  • 基金资助:


Current skill teaching methods of manipulator mainly construct a virtual space through three-dimensional reconstruction technology for manipulator to simulate and train. However, due to the different visual angles between human and manipulator, the traditional visual information reconstruction methods have large reconstruction errors, long time, and need harsh experimental environment and many sensors, so that the skills learned by manipulator in virtual space can not be well transferred to the real environment. To solve the above problems, a visual information real-time reconstruction method for manipulator operation was proposed. Firstly, information was extracted from real-time RGB images through Mask-Region Convolutional Neural Network(Mask-RCNN). Then, the extracted RGB images and other visual information were jointly encoded, and the visual information was mapped to the three-dimensional position information of the manipulator operation space through Residual Neural Network-18 (ResNet-18). Finally, an outlier adjustment method based on Cluster Center DIStance constrained (CC-DIS) was proposed to reduce the reconstruction error, and the adjusted position information was visualized by Open Graphics Library (OpenGL). In this way, the three-dimensional real-time reconstruction of the manipulator operation space was completed. Experimental results show that the proposed method has high reconstruction speed and reconstruction accuracy. It only takes 62.92 milliseconds to complete a three-dimensional reconstruction with a reconstruction speed of up to 16 frames per second and a reconstruction relative error of about 5.23%. Therefore, it can be effectively applied to the manipulator skill teaching tasks.

Key words: skill teaching, Mask-Region Convolutional Neural Network (Mask-RCNN), Residual Neural Network-18 (ResNet-18), three-dimensional real-time reconstruction, manipulator


现阶段的机械臂技能传授方法主要通过三维实时重建技术搭建虚拟空间进行模拟训练。然而人与机械臂视角不同,传统视觉信息重建方法由于重建误差大、时间长,而且实验环境苛刻、所需传感器较多等原因,导致机械臂在虚拟空间内习得的技能不能很好地迁移于现实环境。针对以上问题,提出了一种面向机械臂操作的视觉信息实时重建方法。首先,通过Mask-RCNN(Mask-Region Convolutional Neural Network)对实时采集到的RGB图像提取信息;然后,将提取后的RGB图像及其他视觉信息联合编码,并通过ResNet-18将视觉信息映射为机械臂操作空间的三维位置信息;最后,为减小重建误差,提出了一种聚类簇中心距离受限离群值调整方法(CC-DIS),并利用OpenGL(Open Graphics Library)将调整后的位置信息可视化,完成机械臂操作空间三维实时重建。实验结果表明,所提的实时重建方法具有较快的重建速度和较高的重建精度,完成一次三维重建仅需62.92 ms,重建速度高达每秒16帧,重建相对误差约为5.23%,能有效用于机械臂技能传授任务。

关键词: 技能传授, Mask-RCNN, ResNet-18, 三维实时重建, 机械臂

