《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (6): 1920-1926.DOI: 10.11772/j.issn.1001-9081.2023060866

• 多媒体计算与计算机仿真 • 上一篇    

基于筛选学习网络的六自由度目标位姿估计算法

邴雅星1(), 王阳萍1,2, 雍玖2, 白浩谋3   

  1. 1.兰州交通大学 电子与信息工程学院, 兰州 730070
    2.甘肃省人工智能与图形图像处理工程研究中心(兰州交通大学), 兰州 730070
    3.兰州理工大学 计算机与通信学院, 兰州 730050
  • 收稿日期:2023-07-03 修回日期:2023-09-06 接受日期:2023-09-11 发布日期:2023-10-07 出版日期:2024-06-10
  • 通讯作者: 邴雅星
  • 作者简介:王阳萍(1973—),女,四川达州人,教授,博士,主要研究方向:数字图像处理、虚拟现实
    雍玖(1993—),男,甘肃临夏人,工程师,博士研究生,主要研究方向:数字图像处理、虚拟现实
    白浩谋(1996—),男,甘肃榆中人,硕士,主要研究方向:图像超分辨率、计算机视觉。
  • 基金资助:
    国家自然科学基金资助项目(62067006);教育部人文社会科学研究项目(21YJC880085);甘肃省自然科学基金资助项目(23JRRA845)

Six degrees of freedom object pose estimation algorithm based on filter learning network

Yaxing BING1(), Yangping WANG1,2, Jiu YONG2, Haomou BAI3   

  1. 1.School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou Gansu 730070,China
    2.Gansu Artificial Intelligence and Graphics and Image Processing Engineering Research Center (Lanzhou Jiaotong University),Lanzhou Gansu 730070,China
    3.School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China
  • Received:2023-07-03 Revised:2023-09-06 Accepted:2023-09-11 Online:2023-10-07 Published:2024-06-10
  • Contact: Yaxing BING
  • About author:WANG Yangping, born in 1973, Ph. D., professor. Her research interests include digital image processing, virtual reality.
    YONG Jiu, born in 1993, Ph. D. candidate, engineer. His research interests include digital image processing, virtual reality.
    BAI Haomou, born in 1996, M. S. His research interests include image super-resolution, computer vision.
  • Supported by:
    National Natural Science Foundation of China(62067006);Humanities and Social Sciences Research Project of Ministry of Education(21YJC880085);Gansu Natural Science Foundation(23JRRA845)

摘要:

针对在复杂场景下对弱纹理目标位姿估计的准确性和实时性问题,提出基于筛选学习网络的六自由度(6D)目标位姿估计算法。首先,将标准卷积替换为蓝图可分离卷积(BSConv)以减少模型参数,并使用GeLU(Gaussian error Linear Unit)激活函数,能够更好地逼近正态分布,以提高网络模型的性能;其次,提出上采样筛选编码信息模块(UFAEM),弥补了上采样关键信息丢失的缺陷;最后,提出一种全局注意力机制(GAM),增加上下文信息,更有效地提取输入特征图的信息。在公开数据集LineMOD、YCB-Video和Occlusion LineMOD上测试,实验结果表明,所提算法在网络参数大幅度减少的同时提升了精度。所提算法网络参数量减少近3/4,采用ADD(-S) metric指标,在lineMOD数据集下较Dual-Stream算法精度提升约1.2个百分点,在YCB-Video数据集下较DenseFusion算法精度提升约5.2个百分点,在Occlusion LineMOD数据集下较像素投票网络(PVNet)算法精度提升约6.6个百分点。通过实验结果可知,所提算法对弱纹理目标位姿估计具有较好的效果,对遮挡物体位姿估计具有一定的鲁棒性。

关键词: 目标姿态估计, 蓝图可分离卷积, 注意力机制, 关键点, 深度学习

Abstract:

Six Degrees of freedom (6D) object pose estimation algorithm based on filter learning network was proposed to address the accuracy and real-time performance of object pose estimation for weakly textured objects in complex scenes. Firstly, standard convolutions were replaced with Blueprint Separable Convolutions (BSConv) to reduce model parameters, and GeLU (Gaussian error Linear Unit) activation functions were used to better approximate normal distributions, thereby improving the performance of the network model. Secondly, an Upsampling Filtering And Encoding information Module (UFAEM) was proposed to compensate for the loss of key upsampling information. Finally, a Global Attention Mechanism (GAM) was proposed to increase contextual information and more effectively extracted information from input feature maps. The experimental results on publicly available datasets LineMOD, YCB-Video, and Occlusion LineMOD show that the proposed algorithm significantly reduces network parameters while improving accuracy. The network parameter count of the proposed algorithm is reduced by nearly three-quarters. Using the ADD(-S) metric, the accuracy of the proposed algorithm is improved by about 1.2 percentage points compared to the Dual?Stream algorithm on lineMOD dataset, by about 5.2 percentage points compared to the DenseFusion algorithm on YCB-Video dataset, and by about 6.6 percentage points compared to the Pixel-wise Voting Network (PVNet) algorithm on Occlusion LineMOD dataset. Through experimental results, it is known that the proposed algorithm has excellent performance in estimating the pose of weakly textured objects, and has a certain degree of robustness for estimating the pose of occluded objects.

Key words: object pose estimation, blueprint separable convolution, attention mechanism, keypoint, deep learning

中图分类号: