融合注意力特征的遮挡物体6D姿态估计

doi:10.11772/j.issn.1001-9081.2021101840

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3715-3722.DOI: 10.11772/j.issn.1001-9081.2021101840

• 人工智能 • 上一篇

融合注意力特征的遮挡物体6D姿态估计

马康哲¹, 皮家甜²(), 熊周兵³, 吕佳¹

^1.重庆师范大学计算机与信息科学学院, 重庆 401331
^2.重庆国家应用数学中心(重庆师范大学), 重庆 401331
^3.北京理工大学重庆创新中心, 重庆 401120

收稿日期:2021-10-28 修回日期:2021-12-06 接受日期:2021-12-23 发布日期:2022-01-04 出版日期:2022-12-10
通讯作者: 皮家甜
作者简介:马康哲（1996—），男，山西长治人，硕士研究生，主要研究方向：计算机视觉、物体姿态估计
熊周兵（1984—），男，重庆人，高级工程师，博士，主要研究方向：无人驾驶、感知识别、高精度地图定位
吕佳（1978—），女，四川达州人，教授，博士，主要研究方向：数据挖掘、机器学习。
基金资助:
重庆市教委科技项目重点项目(KJZD?K202114802);重庆市教委科技项目青年项目(KJQN201800116);重庆市高校创新群体项目(CXQT20015);重庆市2021年研究生科研创新项目(CYS21272)

6D pose estimation incorporating attentional features for occluded objects

Kangzhe MA¹, Jiatian PI²(), Zhoubing XIONG³, Jia LYU¹

^1.College of Computer Information and Science，Chongqing Normal University，Chongqing 401331，China
^2.National Center for Applied Mathematics in Chongqing（Chongqing Normal University），Chongqing 401331，China
^3.Beijing Institute of Technology Chongqing Innovation Center，Chongqing 401120，China

Received:2021-10-28 Revised:2021-12-06 Accepted:2021-12-23 Online:2022-01-04 Published:2022-12-10
Contact: Jiatian PI
About author:MA Kangzhe， born in 1996， M. S. candidate. His research interests include deep learning， object pose estimation.
XIONG Zhoubing， born in 1984， Ph. D.， senior engineer. His research interests include unmanned driving， perceptual recognition，high precision mapping and localization.
LYU Jia，born in 1978， Ph. D.， professor. Her research interests include data mining， machine learning.
Supported by:
Key Project of Chongqing Municipal Education Commission Science and Technology Project(KJZD-K202114802);Youth Project of Chongqing Municipal Education Commission Science and Technology Project(KJQN201800116);Fund of Innovation Research Group of Chongqing(CXQT20015);Graduate Research and Innovation Project in 2021 of Chongqing(CYS21272)

摘要/Abstract

摘要：

在机械臂视觉抓取过程中，现有的算法在复杂背景、光照不足、遮挡等条件下，难以对目标物体进行实时、准确、鲁棒的姿态估计。针对以上问题，提出一种基于关键点方法的融合注意力特征的物体6D姿态网络。首先，在跳跃连接（Skip Connection）阶段引入能够聚焦通道空间信息的卷积注意力模块（CBAM），使编码阶段的浅层特征与解码阶段的深层特征进行有效融合，增强特征图的空间域信息和精确位置通道信息；其次，采用归一化损失函数以弱监督的方式回归每个关键点的注意力图，将注意力图作为对应像素位置上关键点偏移量的权重分数；最后，累加求和得到关键点坐标。实验结果证明，所提网络在LINEMOD数据集和Occlusion LINEMOD数据集上ADD（-S）指标分别达到了91.3%和46.3%。与基于关键点的逐像素投票网络（PVNet）相比ADD（-S）指标分别提升了5.0个百分点和5.5个百分点，验证了所提网络在遮挡场景下有更好的鲁棒性。

关键词: 物体6D姿态估计, 注意力模块, 卷积注意力模块, 遮挡物体, 关键点

Abstract:

In the process of robotic vision grasping， it is difficult for the existing algorithms to perform real-time， accurate and robust pose estimation of the target object under complex background， insufficient illumination， occlusion， etc. Aiming at the above problems， a 6D pose estimation network with fused attention features based on the key point method was proposed. Firstly， Convolutional Block Attention Module （CBAM） was added in the skip connection stage to focus the spatial and channel information， so that the shallow features in the encoding stage were effectively fused with the deep features in the decoding stage， the spatial domain information and accurate position channel information of the feature map were enhanced. Secondly， the attention map of every key point was regressed in a weakly supervised way using a normalized loss function. The attention map was used as the weight of the key point offset at the corresponding pixel position. Finally， the coordinates of keypoints were obtained by accumulating and summing. The experimental results demonstrate that the proposed network reaches 91.3% and 46.3% on the LINEMOD and Occlusion LINEMOD datasets respectively in the ADD（-S） metric. 5.0 percentage points and 5.5 percentage points improvement in the ADD（-S） metric are achieved compared to Pixel Voting Network （PVNet）， which verifies that the proposed network improves the robustness of objects in occlusion scenes.

Key words: object 6D pose estimation, attention mechanism, Convolutional Block Attention Module (CBAM), obscured object, key point

中图分类号:

TP391.4

马康哲, 皮家甜, 熊周兵, 吕佳. 融合注意力特征的遮挡物体6D姿态估计[J]. 计算机应用, 2022, 42(12): 3715-3722.

Kangzhe MA, Jiatian PI, Zhoubing XIONG, Jia LYU. 6D pose estimation incorporating attentional features for occluded objects[J]. Journal of Computer Applications, 2022, 42(12): 3715-3722.

图/表 11

参考文献 25

1	QI C R， SU H， MO K C， et al. PointNet： deep learning on point sets for 3D classification and segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 77-85. 10.1109/cvpr.2017.16
2	NIBALI A， HE Z， MORGAN S， et al. Numerical coordinate regression with convolutional neural networks ［EB/OL］. （2018-05-03）［2021-10-11］..
3	PENG S D， LIU Y， HUANG Q X， et al. PVNet： pixel-wise voting network for 6DoF pose estimation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4556-4565. 10.1109/cvpr.2019.00469
4	YANG Z X， YU X， YANG Y. DSC-PoseNet： learning 6DoF object pose estimation via dual-scale consistency［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 3906-3915. 10.1109/cvpr46437.2021.00390
5	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
6	XIANG Y， SCHMIDT T， NARAYANAN V， et al. PoseCNN： a convolutional neural network for 6D object pose estimation in cluttered scenes ［C/OL］// Proceedings of the 2018 Robotics： Science and Systems ［2021-10-11］.. 10.15607/rss.2018.xiv.019
7	KEHL W， MANHARDT F， TOMBARI F， et al. SSD-6D： making RGB-based 3D detection and 6D pose estimation great again［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1530-1538. 10.1109/iccv.2017.169
8	SUNDERMEYER M， MARTON Z C， DURNER M， et al. Implicit 3D orientation learning for 6D object detection from RGB images［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 712-729.
9	梁达勇，陈俊洪，朱展模，等. 多特征像素级融合的遮挡物体6DoF姿态估计研究［J］. 计算机科学与探索， 2020， 14（12）：2072-2082. 10.3778/j.issn.1673-9418.2003041
	LIANG D Y， CHEN J H， ZHU Z M， et al. Research on occluded objects 6DoF pose estimation with multi-features and pixel-level fusion［J］. Journal of Frontiers of Computer Science and Technology， 2020， 14（12）：2072-2082. 10.3778/j.issn.1673-9418.2003041
10	LI Z G， WANG G， JI X Y. CDPN： coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 7677-7686. 10.1109/iccv.2019.00777
11	ZAKHAROV S， SHUGUROV I， ILIC S. DPOD： 6D pose object detector and refiner［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1941-1950. 10.1109/iccv.2019.00203
12	HODAŇ T， BARÁTH D， MATAS J. EPOS： estimating 6D pose of objects with symmetries［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11700-11709. 10.1109/cvpr42600.2020.01172
13	RAD M， LEPETIT V. BB8： a scalable， accurate， robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3848-3856. 10.1109/iccv.2017.413
14	OBERWEGER M， RAD M， LEPETIT V. Making deep heatmaps robust to partial occlusions for 3D object pose estimation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11219. Cham： Springer， 2018： 125-141.
15	SONG C， SONG J R， HUANG Q X. HybridPose： 6D object pose estimation under hybrid representations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 428-437. 10.1109/cvpr42600.2020.00051
16	李坤，侯庆. 基于注意力机制的轻量型人体姿态估计［J］. 计算机应用， 2022， 42（8）：2407-2414.
	LI K， HOU Q. Lightweight human pose estimation based on attention mechanism［J］. Journal of Computer Applications， 2022， 42（8）：2407-2414.
17	STEVŠIČ S， HILLIGES O. Spatial attention improves iterative 6D object pose estimation［C］// Proceedings of the 2020 International Conference on 3D Vision. Piscataway： IEEE， 2020： 1070-1078. 10.1109/3dv50981.2020.00117
18	LEPETIT V， MORENO-NOGUER F， FUA P. EPnP： an accurate O（n） solution to the PnP problem［J］. International Journal of Computer Vision， 2009， 81（2）： 155-166. 10.1007/s11263-008-0152-6
19	HINTERSTOISSER S， LEPETIT V， ILIC S， et al. Model based training， detection and pose estimation of texture-less 3D objects in heavily cluttered scenes［C］// Proceedings of the 2012 Asian Conference on Computer Vision， LNCS 7724. Berlin： Springer， 2013： 548-562.
20	BRACHMANN E， KRULL A， MICHEL F， et al. Learning 6D object pose estimation using 3D object coordinates［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8690. Cham： Springer， 2014： 536-551.
21	BRACHMANN E， MICHEL F， KRULL A， et al. Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3364-3372. 10.1109/cvpr.2016.366
22	XIAO J X， HAYS J， EHINGER K A， et al. SUN database： Large-scale scene recognition from abbey to zoo［C］// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2010： 3485-3492. 10.1109/cvpr.2010.5539970
23	TEKIN B， SINHA S N， FUA P. Real-time seamless single shot 6D object pose prediction［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 292-301. 10.1109/cvpr.2018.00038
24	HU Y L， HUGONOT J， FUA P， et al. Segmentation-driven 6D object pose estimation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3380-3389. 10.1109/cvpr.2019.00350
25	HU Y L， FUA P， WANG W， et al. Single-stage 6D object pose estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2927-2936. 10.1109/cvpr42600.2020.00300

方法	时间
本文方法（基于注意力图（Attention Map））	11.2
PVNet（基于RANSAC voting）	22.8

方法	时间
本文方法（基于注意力图（Attention Map））	11.2
PVNet（基于RANSAC voting）	22.8

方法	测试物体													平均值
方法	ape	benchwise	cam	can	cat	driller	duck	eggbox	glue	holepuncher	iron	lamp	phone	平均值
BB8*^［13］	96.6	90.1	86.0	91.2	98.8	80.9	92.2	91.0	92.3	95.3	84.8	75.8	85.3	89.3
BB8^［13］	95.3	80.0	80.9	84.1	97.0	74.1	81.2	87.9	89.0	90.5	78.9	74.4	77.6	83.9
YOLO6D^［23］	92.1	95.1	93.2	97.4	97.4	79.4	94.7	90.3	96.5	92.9	82.9	76.9	86.1	90.4
PVNet^［3］	99.2	99.8	99.2	99.9	99.3	96.9	98.0	99.3	98.5	100.0	99.2	98.3	99.4	99.0
本文方法	99.2	99.7	99.4	99.8	99.6	98.5	98.7	99.4	98.8	99.9	99.5	98.5	99.6	99.3

方法	测试物体													平均值
方法	ape	benchwise	cam	can	cat	driller	duck	eggbox	glue	holepuncher	iron	lamp	phone	平均值
BB8*^［13］	96.6	90.1	86.0	91.2	98.8	80.9	92.2	91.0	92.3	95.3	84.8	75.8	85.3	89.3
BB8^［13］	95.3	80.0	80.9	84.1	97.0	74.1	81.2	87.9	89.0	90.5	78.9	74.4	77.6	83.9
YOLO6D^［23］	92.1	95.1	93.2	97.4	97.4	79.4	94.7	90.3	96.5	92.9	82.9	76.9	86.1	90.4
PVNet^［3］	99.2	99.8	99.2	99.9	99.3	96.9	98.0	99.3	98.5	100.0	99.2	98.3	99.4	99.0
本文方法	99.2	99.7	99.4	99.8	99.6	98.5	98.7	99.4	98.8	99.9	99.5	98.5	99.6	99.3

方法	测试物体													平均值
方法	ape	benchwise	cam	can	cat	driller	duck	eggbox	glue	holepuncher	iron	lamp	phone	平均值
BB8*^［13］	40.4	91.8	55.7	64.1	62.6	74.4	44.3	57.8	41.2	67.2	84.7	76.5	54.0	62.7
HybridPose*^［15］	63.1	99.9	90.4	98.5	89.4	98.5	65.0	100.0	98.8	89.7	100.0	99.5	94.9	91.3
DPOD*^［11］	87.7	98.5	96.1	99.7	94.7	98.8	86.3	99.9	96.8	86.9	100.0	96.8	94.7	95.2
YOLO6D^［23］	21.6	81.8	36.6	68.8	41.8	63.5	27.2	69.6	80.0	42.6	75.0	71.1	47.7	56.0
DPOD^［11］	53.3	95.3	90.4	94.1	60.4	97.7	66.0	99.7	93.8	65.8	99.8	88.1	74.2	83.0
PVNet^［3］	43.6	99.9	86.9	95.5	79.3	96.4	52.6	99.2	95.7	81.9	98.9	99.3	92.4	86.3
CDPN^［10］	64.4	97.8	91.7	95.9	83.8	96.2	66.8	99.7	99.6	85.8	97.9	97.9	90.8	89.9
本文方法	68.6	99.9	88.0	97.8	86.9	98.4	68.6	100.0	98.0	89.9	99.1	100.0	91.7	91.3

融合注意力特征的遮挡物体6D姿态估计

6D pose estimation incorporating attentional features for occluded objects

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 25

相关文章 15

编辑推荐

Metrics

基于 RANSAC voting	基于attention map	CBAM	2D 投影指标		ADD（-S）
基于 RANSAC voting	基于attention map	CBAM	LINEMOD	Occlusion LINEMOD	LINEMOD	Occlusion LINEMOD
√			99.0	61.1	86.3	40.8
	√		99.2	60.2	90.9	44.2
	√	√	99.3	61.4	91.3	46.3

[1]	谢新林, 肖毅, 续欣莹. 基于神经网络架构搜索的肺结节分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1424-1430.
[2]	牛瑞华, 杨俊, 邢斓馨, 吴仁彪. 基于卷积注意力模块和双通道网络的微表情识别算法[J]. 计算机应用, 2021, 41(9): 2552-2559.
[3]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[4]	汤桂花, 孙磊, 毛秀青, 戴乐育, 胡永进. 基于深度对齐网络的生成对抗网络伪造人脸检测[J]. 计算机应用, 2021, 41(7): 1922-1927.
[5]	徐婉晴, 王保栋, 黄艺美, 李金屏. 基于人体骨骼关键点的吸烟行为检测算法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3602-3607.
[6]	赵津, 宋文爱, 邰隽, 杨吉江, 王青, 李晓丹, 雷毅, 邱悦. 儿童阻塞性睡眠呼吸暂停计算机人脸辅助诊断综述[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3394-3401.
[7]	强保华, 翟艺杰, 陈金龙, 谢武, 郑虹, 王学文, 张世豪. 基于改进CPMs和SqueezeNet的轻量级人体骨骼关键点检测模型[J]. 计算机应用, 2020, 40(6): 1806-1811.
[8]	朱锴, 付忠良, 陈晓清. 基于卷积神经网络的超声图像左心室分割方法[J]. 计算机应用, 2019, 39(7): 2121-2124.
[9]	谢金衡, 张炎生. 基于深度残差和特征金字塔网络的实时多人脸关键点定位算法[J]. 计算机应用, 2019, 39(12): 3659-3664.
[10]	汪检兵, 李俊. 基于OpenPose-slim模型的人体骨骼关键点检测方法[J]. 计算机应用, 2019, 39(12): 3503-3509.
[11]	马林进, 万良, 马绍菊, 杨婷, 易辉凡. 基于词袋模型的分布式拒绝服务攻击检测[J]. 计算机应用, 2017, 37(6): 1644-1649.
[12]	陈嫒嫒, 李来, 刘光灿, 刘青山. 基于关键点的服装检索[J]. 计算机应用, 2017, 37(11): 3249-3255.
[13]	毛文涛, 王文朋, 蒋梦雪, 欧阳军. 基于局部特征过滤的快速火焰图像识别方法[J]. 计算机应用, 2016, 36(10): 2907-2911.
[14]	张丽敏周尚波. 基于分数阶微分的尺度不变特征变换图像匹配算法[J]. 计算机应用, 2011, 31(04): 1019-1023.
[15]	张洁杜奕卢德唐李道伦. 基于时间序列线性拟合的色谱数据压缩方法[J]. 计算机应用, 2007, 27(7): 1702-1704.

方法	测试物体								平均值
方法	ape	can	cat	driller	duck	eggbox	glue	holepuncher	平均值
Oberweger^［14］	69.9	82.6	65.1	73.8	61.4	13.1	54.9	66.4	60.9
SegDriven^［24］	59.1	59.8	46.9	59.0	42.6	11.9	16.5	63.6	44.9
PVNet^［3］	69.1	86.1	65.1	73.1	61.4	8.4	55.4	69.8	61.1
本文方法	64.6	87.4	61.6	80.0	60.2	5.6	52.3	79.9	61.5

方法	测试物体								平均值
方法	ape	can	cat	driller	duck	eggbox	glue	holepuncher	平均值
Oberweger^［14］	69.9	82.6	65.1	73.8	61.4	13.1	54.9	66.4	60.9
SegDriven^［24］	59.1	59.8	46.9	59.0	42.6	11.9	16.5	63.6	44.9
PVNet^［3］	69.1	86.1	65.1	73.1	61.4	8.4	55.4	69.8	61.1
本文方法	64.6	87.4	61.6	80.0	60.2	5.6	52.3	79.9	61.5

方法	测试物体								平均值
方法	ape	can	cat	driller	duck	eggbox	glue	holepuncher	平均值
DPOD*^［11］	—	—	—	—	—	—	—	—	47.3
HybridPose*^［15］	20.9	75.3	24.9	70.2	27.9	52.4	53.8	54.2	47.5
Oberweger^［14］	17.6	53.6	3.3	62.4	19.2	25.9	39.6	21.3	30.4
SegDriven^［24］	12.1	39.9	8.2	45.2	17.2	22.1	35.8	36.0	27.0
DPOD^［11］	—	—	—	—	—	—	-	—	32.8
SSPE^［25］	19.2	65.1	18.9	69.0	25.3	52.0	51.4	45.6	43.3
PVNet^［3］	15.8	63.3	16.7	65.7	25.2	50.2	49.6	39.7	40.8
本文方法	21.0	79.9	23.5	74.2	31.3	42.2	44.5	53.8	46.3

方法	测试物体								平均值
方法	ape	can	cat	driller	duck	eggbox	glue	holepuncher	平均值
DPOD*^［11］	—	—	—	—	—	—	—	—	47.3
HybridPose*^［15］	20.9	75.3	24.9	70.2	27.9	52.4	53.8	54.2	47.5
Oberweger^［14］	17.6	53.6	3.3	62.4	19.2	25.9	39.6	21.3	30.4
SegDriven^［24］	12.1	39.9	8.2	45.2	17.2	22.1	35.8	36.0	27.0
DPOD^［11］	—	—	—	—	—	—	-	—	32.8
SSPE^［25］	19.2	65.1	18.9	69.0	25.3	52.0	51.4	45.6	43.3
PVNet^［3］	15.8	63.3	16.7	65.7	25.2	50.2	49.6	39.7	40.8
本文方法	21.0	79.9	23.5	74.2	31.3	42.2	44.5	53.8	46.3