Weakly perceived object detection method based on point cloud completion and multi-resolution Transformer

doi:10.11772/j.issn.1001-9081.2022060908

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2155-2165.DOI: 10.11772/j.issn.1001-9081.2022060908

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Weakly perceived object detection method based on point cloud completion and multi-resolution Transformer

Jing ZHOU¹(), Yiyu HU¹, Chengyu HU², Tianjiang WANG³

^1.School of Artificial Intelligence，Jianghan University，Wuhan Hubei 430056，China
^2.School of Computer Science，China University of Geoscience，Wuhan Hubei 430074，China
^3.School of Computer Science and Technology，Huazhong University of Science and Technology，Wuhan Hubei 430074，China

Received:2022-06-23 Revised:2022-09-16 Accepted:2022-09-22 Online:2022-10-18 Published:2023-07-10
Contact: Jing ZHOU
About author:ZHOU Jing， born in 1981， Ph. D.， professor. Her research interests include three-dimensional object detection， deep learning.
HU Yiyu， born in 1999， M. S. candidate. Her research interests include object detection， deep learning.
HU Chengyu， born in 1978， Ph. D.， professor. His research interests include intelligent computing， deep learning.
WANG Tianjiang， born in 1960， Ph. D.， professor. His research interests include computer vision， deep learning.
Supported by:
National Natural Science Foundation of China(62106086);Natural Science Foundation of Hubei Province(2021CFB564)

基于点云补全和多分辨Transformer的弱感知目标检测方法

周静¹(), 胡怡宇¹, 胡成玉², 王天江³

^1.江汉大学人工智能学院, 武汉 430056
^2.中国地质大学计算机学院, 武汉 430074
^3.华中科技大学计算机科学与技术学院, 武汉 430074

通讯作者: 周静
作者简介:周静（1981—），女，湖北襄阳人，教授，博士，主要研究方向：三维目标检测、深度学习；
胡怡宇（1999—），女，湖北仙桃人，硕士研究生，主要研究方向：目标检测、深度学习；
胡成玉（1978—），男，湖北枣阳人，教授，博士，CCF会员，主要研究方向：智能计算、深度学习；
王天江（1960—），男，湖北武汉人，教授，博士，主要研究方向：计算机视觉、深度学习。
基金资助:
国家自然科学基金资助项目(62106086);湖北省自然科学基金资助项目(2021CFB564)

Abstract

Abstract:

To solve the problem of low detection precision of weakly perceived objects with missing shapes in distant or occluded scenes， a Weakly Perceived object detection method based on point cloud Completion and Multi-resolution Transformer （WP-CMT） was proposed. Firstly， since that some key information was lost due to the down-sampling convolution operation in object detection network， the Part-Aware and Aggregation （Part-A²） method with deconvolution up-sampling structure was chosen as the basic network to generate the initial proposals. Then， in order to enhance the shape and position features of the weakly perceived objects in the initial proposals， the point cloud completion module was applied to reconstruct the dense point sets on the surface of the weakly perceptive objects， and a novel multi-resolution Transformer feature encoding module was constructed to aggregate the completed shape features with original spatial location information of the weakly perceived objects， and then the enhanced local features of the weakly perceived objects were captured by encoding the contextual semantic correlation of the aggregated features on local coordinate point sets with different resolutions. Finally， the refined bounding boxes were generated. Experimental results show that WP-CMT achieves 2.51 percentage points gain on average precision and 1.59 percentage points on mean average precision compared to baseline method Part-A² for the weakly perceived objects at hard level in KITTI and Waymo datasets， which proves the effectiveness of the proposed method for weakly perceived object detection. Meanwhile， ablation experimental results show that the point cloud completion and multi-resolution Transformer feature encoding modules in WP-CMT can effectively improve the detection performance of weakly perceived objects for different Region Proposal Network （RPN） structures.

Key words: three-dimensional object detection, weakly perceived object, point cloud completion, feature encoding, multi-resolution Transformer

摘要：

针对远距离或遮挡场景中形状缺失的弱感知目标的检测精确率低下的问题，提出一种基于点云补全和多分辨Transformer的弱感知目标检测方法（WP-CMT）。首先，考虑到目标检测网络中的下采样卷积操作会导致部分关键信息的丢失，选取具有反卷积上采样结构的部分感知聚合（Part-A²）方法作为基础网络以生成初始候选框；然后，为增强初始候选框中的弱感知目标形状及位置特征，采用点云补全模块重构弱感知目标表面的密集点集，并构建新颖的多分辨Transformer特征编码模块来聚合弱感知目标的补全形状特征和原始空间位置信息，通过逐步编码不同分辨率局部坐标点集上的聚合特征的上下文语义相关性来捕获弱感知目标增强的局部特征，最终生成精细化的目标检测框。实验结果表明：对于KITTI和Waymo数据集中的弱感知困难级别目标，WP-CMT的平均精确率和平均精确率均值分别比基准方法Part-A²提升了2.51和1.59个百分点，验证了该方法对弱感知目标检测的有效性。同时，消融实验结果表明WP-CMT中的点云补全和多分辨Transformer特征编码模块对于不同类型的区域候选网络（RPN）结构均能有效提升弱感知目标的检测性能。

关键词: 三维目标检测, 弱感知目标, 点云补全, 特征编码, 多分辨Transformer

CLC Number:

TP391.41

Jing ZHOU, Yiyu HU, Chengyu HU, Tianjiang WANG. Weakly perceived object detection method based on point cloud completion and multi-resolution Transformer[J]. Journal of Computer Applications, 2023, 43(7): 2155-2165.

周静, 胡怡宇, 胡成玉, 王天江. 基于点云补全和多分辨Transformer的弱感知目标检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2155-2165.

Figures/Tables 16

References 31

1	CHEN X Z， MA H M， WAN J， et al. Multi-view 3D object detection network for autonomous driving ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6526-6534. 10.1109/cvpr.2017.691
2	KU J， MOZIFIAN M， LEE J， et al. Joint 3D proposal generation and object detection from view aggregation ［C］// Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2018： 1-8. 10.1109/iros.2018.8594049
3	LIANG M， YANG B， CHEN Y， et al. Multi-task multi-sensor fusion for 3D object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7337-7345. 10.1109/cvpr.2019.00752
4	ZHOU Y， SUN P， ZHANG Y， et al. End-to-end multi-view fusion for 3D object detection in LiDAR point clouds ［C］// Proceedings of the 3rd Conference on Robot Learning. New York： JMLR.org， 2020： 923-932.
5	DENG J J， ZHOU W G， ZHANG Y Y， et al. From multi-view to Hollow-3D： hallucinated Hollow-3D R-CNN for 3D object detection［J］. IEEE Transactions on Circuits Systems for Video Technology， 2021， 31（12）： 4722-4734. 10.1109/tcsvt.2021.3100848
6	QI C R， SU H， MO K C， et al. PointNet： deep learning on point sets for 3D classification and segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 77-85. 10.1109/cvpr.2017.16
7	QI C R， YI L， SU H， et al. PointNet++： deep hierarchical feature learning on point sets in a metric space ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 5105-5114.
8	QI C R， LIU W， WU C X， et al. Frustum PointNets for 3D object detection from RGB-D data ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 918-927. 10.1109/cvpr.2018.00102
9	QI C R， LITANY O， HE K M， et al. Deep Hough voting for 3D object detection in point clouds ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9276-9285. 10.1109/iccv.2019.00937
10	SHI S S， WANG X G， LI H S. PointRCNN： 3D object proposal generation and detection from point cloud ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 770-779. 10.1109/cvpr.2019.00086
11	YANG Z T， SUN Y N， LIU S， et al. 3DSSD： point-based 3D single stage object detector ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11037-11045. 10.1109/cvpr42600.2020.01105
12	MISRA I， GIRDHAR R， JOULIN A. An end-to-end transformer model for 3D object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 2886-2897. 10.1109/iccv48922.2021.00290
13	LIU Z， ZHANG Z， CAO Y， et al. Group-free 3D object detection via transformers ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 2929-2938. 10.1109/iccv48922.2021.00294
14	PAN X R， XIA Z F， SONG S J， et al. 3D object detection with Pointformer ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 7459-7468. 10.1109/cvpr46437.2021.00738
15	孙刘杰，赵进，王文举，等.多尺度Transformer激光雷达点云3D物体检测［J］.计算机工程与应用， 2022， 58（8）： 136-146. 10.3778/j.issn.1002-8331.2109-0489
	SUN L J， ZHAO J， WANG W J， et al. Multi-scale transformer LiDAR point cloud 3D object detection［J］. Computer Engineering and Applications， 2022， 58（8）： 136-146. 10.3778/j.issn.1002-8331.2109-0489
16	ZHANG Y F， HU Q Y， XU G Q， et al. Not all points are equal： learning highly efficient point-based detectors for 3D LiDAR point clouds ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 18931-18940. 10.1109/cvpr52688.2022.01838
17	YAN Y， MAO Y X， LI B. SECOND： sparsely embedded convolutional detection［J］. Sensors， 2018， 18（10）： No.3337. 10.3390/s18103337
18	LANG A H， VORA S， CAESAR H， et al. PointPillars： fast encoders for object detection from point clouds ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 12689-12697. 10.1109/cvpr.2019.01298
19	LIU Z， ZHAO X， HUANG T T， et al. TANet： robust 3D object detection from point clouds with triple attention ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 11677-11684. 10.1609/aaai.v34i07.6837
20	YE M S， XU S J， CAO T Y. HVNet： hybrid voxel network for LiDAR based 3D object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1628-1637. 10.1109/cvpr42600.2020.00170
21	DENG J J， SHI S S， LI P W， et al. Voxel R-CNN： towards high performance voxel-based 3D object detection ［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021： 1201-1209. 10.1609/aaai.v35i2.16207
22	ZHANG W C， LI W， XU D. SRDAN： scale-aware and range-aware domain adaptation network for cross-dataset 3D object detection ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 6765-6775. 10.1109/cvpr46437.2021.00670
23	HE C H， LI R H， LI S， et al. Voxel set Transformer： a set-to-set approach to 3D object detection from point clouds ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 8407-8417. 10.1109/cvpr52688.2022.00823
24	HE C H， ZENG H， HUANG J Q， et al. Structure aware single-stage 3D object detection from point cloud ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11870-11879. 10.1109/cvpr42600.2020.01189
25	SHI S S， WANG Z， SHI J P， et al. From points to parts： 3D object detection from point cloud with part-aware and part-aggregation network［J］. IEEE Transactions on Pattern Analysis Machine Intelligence， 2021， 43（8）： 2647-2664.
26	XIE L， XIANG C， YU Z X， et al. PI-RCNN： an efficient multi-sensor 3D object detector with point-based attentive Cont-Conv fusion module ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 12460-12467. 10.1609/aaai.v34i07.6933
27	CHEN Y L， LIU S， SHEN X Y， et al. Fast point R-CNN ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9774-9783. 10.1109/iccv.2019.00987
28	DU L， YE X Q， TAN X， et al. Associate-3Ddet： perceptual-to-conceptual association for 3D point cloud object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13326-13335. 10.1109/cvpr42600.2020.01334
29	裴仪瑶，郭会明，张丹普，等.基于定位不确定性的鲁棒3D目标检测方法［J］.计算机应用， 2021， 41（10）： 2979-2984. 10.11772/j.issn.1001-9081.2020122055
	PEI Y Y， GUO H M， ZHANG D P， et al. Robust 3D object detection method based on localization uncertainty［J］. Journal of Computer Applications， 2021， 41（10）： 2979-2984. 10.11772/j.issn.1001-9081.2020122055
30	NGIAM J， CAINE B， HAN W， et al. StarNet： targeted computation for object detection in point clouds［EB/OL］. （2019-12-02）［2022-06-19］. .
31	MAO J G， NIU M Z， BAI H Y， et al. Pyramid R-CNN： towards better performance and adaptability for 3D object detection ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 2703-2712. 10.1109/iccv48922.2021.00272

检测方法	mAP/%	AP 3D/% （IoU=0.7）	时间/ms
检测方法	mAP/%	困难	时间/ms
AVOD^［2］	75.83	68.65	100
F-PointNet^［8］	72.78	63.65	170
PI-RCNN^［26］	80.56	76.17	107
SECOND^［17］	81.48	77.22	50
PointPillars^［18］	79.76	74.77	16
FastPoint RCNN^［27］	81.87	77.48	65
TANet^［19］	80.56	75.62	38
Associate-3Ddet^［28］	82.07	77.76	60
HVNet^［20］	78.86	71.79	32
PointRCNN^［10］	81.63	77.38	100
SECOND-Gaussian+^［29］	81.69	77.44	68
SA-SSD^［24］	82.95	78.78	40
Part-A^2［25］	82.49	78.54	83
WP-CMT（本文方法）	84.24	79.22	87

检测方法	mAP/%	AP 3D/% （IoU=0.7）	时间/ms
检测方法	mAP/%	困难	时间/ms
AVOD^［2］	75.83	68.65	100
F-PointNet^［8］	72.78	63.65	170
PI-RCNN^［26］	80.56	76.17	107
SECOND^［17］	81.48	77.22	50
PointPillars^［18］	79.76	74.77	16
FastPoint RCNN^［27］	81.87	77.48	65
TANet^［19］	80.56	75.62	38
Associate-3Ddet^［28］	82.07	77.76	60
HVNet^［20］	78.86	71.79	32
PointRCNN^［10］	81.63	77.38	100
SECOND-Gaussian+^［29］	81.69	77.44	68
SA-SSD^［24］	82.95	78.78	40
Part-A^2［25］	82.49	78.54	83
WP-CMT（本文方法）	84.24	79.22	87

检测方法	AP 3D/% （IoU=0.7）
检测方法	简单	中等	困难
AVOD^［2］	83.07	71.76	65.73
F-PointNet^［8］	82.19	69.79	60.59
PI-RCNN^［26］	84.37	74.82	70.03
SECOND^［17］	83.13	73.66	66.20
PointPillars^［18］	82.58	74.31	68.99
Fast PointRCNN^［27］	85.29	77.40	70.24
TANet^［19］	83.81	75.38	67.66
Associate-3Ddet^［28］	85.99	77.40	70.53
PointRCNN^［10］	86.96	75.64	70.70
MSPTRCNN^［15］	87.45	77.44	70.39
Part-A^2［25］	87.81	78.49	73.51
Pointformer^［14］	87.13	77.06	69.25
SA-SSD^［24］	88.75	79.79	74.16
WP-CMT（本文方法）	87.47	80.52	76.02

检测方法	AP 3D/% （IoU=0.7）
检测方法	简单	中等	困难
AVOD^［2］	83.07	71.76	65.73
F-PointNet^［8］	82.19	69.79	60.59
PI-RCNN^［26］	84.37	74.82	70.03
SECOND^［17］	83.13	73.66	66.20
PointPillars^［18］	82.58	74.31	68.99
Fast PointRCNN^［27］	85.29	77.40	70.24
TANet^［19］	83.81	75.38	67.66
Associate-3Ddet^［28］	85.99	77.40	70.53
PointRCNN^［10］	86.96	75.64	70.70
MSPTRCNN^［15］	87.45	77.44	70.39
Part-A^2［25］	87.81	78.49	73.51
Pointformer^［14］	87.13	77.06	69.25
SA-SSD^［24］	88.75	79.79	74.16
WP-CMT（本文方法）	87.47	80.52	76.02

检测方法	mAP/%（IoU=0.7）		mAPH/%（IoU=0.7）
检测方法	级别1	级别2	级别1	级别2
Pointpillars^［18］	63.30	55.20	62.70	54.70
SECOND^［17］	68.03	59.57	67.44	59.04
StarNet^［30］	64.70	45.50	56.30	39.60
MVF^［4］	62.93	—	—	—
PointRCNN^［10］	45.05	37.41	44.25	36.74
Pyramid-P^［31］	47.02	39.10	46.58	38.76
Part-A^2［25］	71.69	64.21	71.16	63.70
WP-CMT	73.04	65.80	72.52	65.31

Weakly perceived object detection method based on point cloud completion and multi-resolution Transformer

基于点云补全和多分辨Transformer的弱感知目标检测方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 31

Related Articles 15

Recommended Articles

Metrics

实验序号	检测方法	AP 3D/% （IoU=0.7）			AP 3D/% （IoU=0.5）
实验序号	检测方法	简单	中等	困难	简单	中等	困难
A	消融基线	88.98	79.11	77.82	90.59	89.20	89.14
B	消融基线+MTE	89.18	81.77	78.24	95.64	89.61	89.33
C	消融基线+TPC	89.03	80.67	78.02	94.77	89.52	89.18
D	WP-CMT	89.58	83.93	79.22	97.14	89.67	89.35
E	Part-A²	89.47	79.47	78.54	94.71	89.27	89.03

AF子模块中各步操作				AP 3D/% （IoU=0.7）
拼接	加权	激活	残差	简单	中等	困难
√	×	×	×	89.23	82.95	78.80
√	√	×	×	89.45	83.82	79.15
√	√	√	×	89.49	83.85	79.18
√	√	√	√	89.58	83.93	79.22

实验序号	方法	AP 3D/% （IoU=0.7）
实验序号	方法	简单	中等	困难
F	PointNet++编码	89.29	82.99	78.56
G	MT编码（本文方法）	89.58	83.93	79.22

[1]	. Data augmentation method for abnormal passenger behavior in elevators based on dynamic graph convolutional network [J]. Journal of Computer Applications, 0, (): 0-0.
[2]	. Text-based person retrieval method based on multi-granularity shared semantic center association [J]. Journal of Computer Applications, 0, (): 0-0.
[3]	. Image Caption Method Based on Swin-Transformer and Multi-Scale Feature Fusion [J]. Journal of Computer Applications, 0, (): 0-0.
[4]	Keyi FU, Gaocai WANG, Man WU. Few-shot object detection method based on improved region proposal network and feature aggregation [J]. Journal of Computer Applications, 2024, 44(12): 3790-3797.
[5]	Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
[6]	Xin ZHAO, Xinjie LI, Jian XU, Buyun LIU, Xiang BI. Parallel medical image registration model based on convolutional neural network and Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3915-3921.
[7]	. 3D object detection algorithm based on multiscale network with axial attention [J]. Journal of Computer Applications, 0, (): 0-0.
[8]	. Few-shot skin image classification model based on spatial transformation and feature distribution calibration [J]. Journal of Computer Applications, 0, (): 0-0.
[9]	. Multi-target detection algorithm for traffic intersection images based on YOLOv9 [J]. Journal of Computer Applications, 0, (): 0-0.
[10]	. Semi-supervised object detection framework guided by self-paced learning#br# [J]. Journal of Computer Applications, 0, (): 0-0.
[11]	Yaobin ZOU, Bin ZHANG. Automatic thresholding method guided by maximizing four-directional weighted Shannon entropy [J]. Journal of Computer Applications, 2024, 44(11): 3565-3573.
[12]	Tao LIU, Shihong JU, Yimeng GAO. Small object detection algorithm from drone perspective based on improved YOLOv8n [J]. Journal of Computer Applications, 2024, 44(11): 3603-3609.
[13]	Yusheng LIU, Xuezhong XIAO. High-fidelity image editing based on fine-tuning of diffusion model [J]. Journal of Computer Applications, 2024, 44(11): 3574-3580.
[14]	Lihua HU, Xiaoping LI, Jianhua HU, Sulan ZHANG. Multi-view stereo method based on quadtree prior assistance [J]. Journal of Computer Applications, 2024, 44(11): 3556-3564.
[15]	Cong GU, Qiqiang DUAN, Siyu REN. Polyp segmentation algorithm based on context-aware network [J]. Journal of Computer Applications, 2024, 44(11): 3617-3622.