Classification algorithm for point cloud based on local-global interaction and structural Transformer

doi:10.11772/j.issn.1001-9081.2024050572

Abstract

Abstract:

Aiming at the problem of insufficient local and global feature extraction in the feature extraction process of point cloud classification， a point cloud classification algorithm with local-global interaction and structural Transformer was proposed. Firstly， a dual-branch parallel local-global interaction framework was proposed and used to extract local and global features respectively， where in one branch， maximum pooling and convolution were used to extract local features， and in the other branch， global features were extracted by using average pooling and Transformer. Meanwhile， considering the importance of position information in Transformer， a structural Transformer was proposed to further enhance the global structural features by applying interaction of position information with current features for several times. Finally， the local-global features were used for classification to complete the classification task of point cloud. Experimental results show that the classification Overall Accuracies （OAs） of the proposed algorithm are 93.6% and 87.5% respectively on ModelNet40 and ScanObjectNN benchmark datasets. It can be seen that the proposed local-global interaction and structural Transformer network achieve good performance in point cloud classification task.

Key words: deep learning, point cloud classification, local-global interaction, structural Transformer

摘要：

针对点云分类特征提取过程中局部与全局特征提取不充分的问题，提出一种局部-全局交互与结构Transformer的点云分类算法。首先，提出双支并行的局部-全局交互框架并分别提取局部特征和全局特征，其中一支用最大池化与卷积提取局部特征，另一支用平均池化与Transformer提取全局特征。同时，考虑Transformer中位置信息的重要性，提出结构Transformer，以多次应用位置信息与当前特征的交互，进一步增强全局结构特征。最后，利用局部-全局特征进行分类，以完成点云的分类任务。实验结果表明，所提算法在ModelNet40和ScanObjectNN数据集上分别获得了93.6%和87.5%的总体准确率（OA）。可见，所提出的局部-全局交互与结构Transformer网络在点云分类任务中取得了良好的性能。

关键词: 深度学习, 点云分类, 局部-全局交互, 结构Transformer

CLC Number:

TP391.4

Kai CHEN, Hailiang YE, Feilong CAO. Classification algorithm for point cloud based on local-global interaction and structural Transformer[J]. Journal of Computer Applications, 2025, 45(5): 1671-1676.

陈凯, 叶海良, 曹飞龙. 基于局部-全局交互与结构Transformer的点云分类算法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1671-1676.

Figures/Tables 11

Fig. 1 Framework of LGSTNet

Fig. 2 Local-global interaction framework

Fig. 3 Structural Transformer module

Tab. 1 Classification performance comparison on ModelNet40 dataset

算法	输入格式	输入点数	mAcc/%	OA/%
PointNet［2017］	坐标	1 024	86.0	89.2
PointNet++［2017］	坐标+法向量	5 000	—	91.9
RS-CNN［2019］	坐标	1 024	—	92.9
DGCNN［2019］	坐标	1 024	90.2	92.9
KPConv［2019］	坐标	6 800	—	92.9
DRNet ［2021］	坐标	1 024	—	93.1
PRA-Net［2021］	坐标	1 024	90.6	93.2
PCT ［2021］	坐标	1 024	—	93.2
CT ［2021］	坐标	1 024	90.8	93.1
Point-BERT［2022］	坐标	1 024	—	93.2
PatchFormer［2022］	坐标	1 024	—	93.2
CSANet［2022］	坐标	1 024	89.9	92.8
LFT-Net［2023］	坐标+法向量	1 024	89.7	93.2
AGConv［2023］	坐标	1 024	90.7	93.4
LGSTNet	坐标	1 024	90.8	93.6

Tab. 2 Classification performance comparison on ScanObjectNN dataset

算法	mAcc/%	OA/%
PointNet［2017］	63.4	68.2
PointNet++［2017］	75.4	77.9
DGCNN［2019］	73.6	78.1
MVTN［2021］	—	82.8
DRNet［2021］	78.0	80.3
CT［2021］	83.1	85.5
PointFormer［2022］	78.9	81.1
Point-BERT［2022］	—	83.1
PointMLP［2022］	83.9	85.4
RepSurf-U［2022］	83.1	86.0
GLSCN［2023］	84.1	85.8
Point-PN［2023］	—	87.1
LGSTNet	86.5	87.5

Fig. 4 t-SNE visual comparison on ModelNet40 dataset

Fig. 5 t-SNE visual comparison on ScanObjectNN dataset

Tab. 3 Module ablation experimental results

模型	局部-全局交互框架		结构Transformer	OA/%	mAcc/%
模型	局部特征分支	全局特征分支	结构Transformer	OA/%	mAcc/%
A	√			86.0	84.5
B		√		80.7	77.8
C	√	√		86.7	85.2
D	√	√	√	87.5	86.5

Tab. 4 Complexity comparison of different algorithms on ScanObjectNN dataset

算法	参数量/10⁶	吞吐量/（shape· $s - 1$ ）	GFLOPs	OA/%
PointNet	3.47	518	0.45	68.2
PointNet++	1.74	29	4.03	77.9
DGCNN	1.81	104	2.43	78.1
CT	22.91	16	12.69	85.5
PointFormer	3.99	94	3.48	81.1
PointMLP	12.60	19	15.70	85.5
LGSTNet	4.96	151	3.36	87.5

Tab. 4 Complexity comparison of different algorithms on ScanObjectNN dataset

算法	参数量/10⁶	吞吐量/（shape· $s - 1$ ）	GFLOPs	OA/%
PointNet	3.47	518	0.45	68.2
PointNet++	1.74	29	4.03	77.9
DGCNN	1.81	104	2.43	78.1
CT	22.91	16	12.69	85.5
PointFormer	3.99	94	3.48	81.1
PointMLP	12.60	19	15.70	85.5
LGSTNet	4.96	151	3.36	87.5

Tab. 5 Influence of k on performance on ScanObjectNN dataset

k	mAcc/%	OA/%	k	mAcc/%	OA/%
8	81.8	84.0	20	84.8	86.7
12	83.3	85.4	24	84.8	86.6
16	86.5	87.5

Tab. 6 Influence of i on performance on ScanObjectNN dataset

i	mAcc/%	OA/%	参数量/10⁶
3	84.6	85.6	1.40
4	86.5	87.5	4.96
5	85.6	87.5	18.89

References 41

1	ZHENG C， YAN X， ZHANG H， et al. Beyond 3D Siamese tracking： a motion-centric paradigm for 3D single object tracking in point clouds［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 8101-8110.
2	CHEN X， MA H， WAN J， et al. Multi-view 3D object detection network for autonomous driving［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6526-6534.
3	TU C， TAKEUCHI E， CARBALLO A， et al. Point cloud compression for 3D LiDAR sensor using recurrent neural network with residual blocks［C］// Proceedings of the 2019 International Conference on Robotics and Automation. Piscataway： IEEE， 2019： 3274-3280.
4	SU H， MAJI S， KALOGERAKIS E， et al. Multi-view convolutional neural networks for 3D shape recognition［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 945-953.
5	MATURANA D， SCHERER S. VoxNet： a 3D convolutional neural network for real-time object recognition［C］// Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2015： 922-928.
6	HAMDI A， GIANCOLA S， GHANEM B. MVTN： multi-view transformation network for 3D shape recognition［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 1-11.
7	QI C R， SU H， MO K， et al. PointNet： deep learning on point sets for 3D classification and segmentation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 77-85.
8	QI C R， YI L， SU H， et al. PointNet++： deep hierarchical feature learning on point sets in a metric space［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 5105-5114.
9	MA X， QIN C， YOU H， et al. Rethinking network design and local geometry in point cloud： a simple residual MLP framework［EB/OL］. ［2024-10-22］..
10	ZHANG R， WANG L， WANG Y， et al. Starting from non-parametric networks for 3D point cloud analysis［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 5344-5353.
11	WEI M， WEI Z， ZHOU H， et al. AGConv： adaptive graph convolution on 3D point clouds［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（8）： 9374-9392.
12	THOMAS H， QI C R， DESCHAUD J E， et al. KPConv： flexible and deformable convolution for point clouds［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6410-6419.
13	LIU Y， FAN B， XIANG S， et al. Relation-shape convolutional neural network for point cloud analysis［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8887-8896.
14	SIMONOVSKY M， KOMODAKIS N. Dynamic edge-conditioned filters in convolutional neural networks on graphs［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 29-38.
15	DU Z， YE H， CAO F. A novel local-global graph convolutional method for point cloud semantic segmentation［J］. IEEE Transactions on Neural Networks and Learning Systems， 2024， 35（4）： 4798-4812.
16	WANG Y， SUN Y， LIU Z， et al. Dynamic graph CNN for learning on point clouds［J］. ACM Transactions on Graphics， 2019， 38（5）： No.146.
17	QIU S， ANWAR S， BARNES N. Dense-resolution network for point cloud classification and segmentation［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 3812-3821.
18	RAN H， LIU J， WANG C. Surface representation for point clouds［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 18920-18930.
19	LIANG J， DU Z， LIANG J， et al. Long and short-range dependency graph structure learning framework on point cloud［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（12）： 14975-14989.
20	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
21	BROWN T B， MANN B， RYDER N， et al. Language models are few-shot learners［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 1877-1901.
22	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. ［2024-10-22］..
23	WANG W， XIE E， LI X， et al. Pyramid vision Transformer： a versatile backbone for dense prediction without convolutions［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 548-558.
24	WU H， XIAO B， CODELLA N， et al. CVT： introducing convolutions to vision Transformers［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 22-31.
25	ZHAO H， JIANG L， JIA J， et al. Point Transformer［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 16239-16248.
26	GUO M H， CAI J X， LIU Z N， et al. PCT： point cloud Transformer［J］. Computational Visual Media， 2021， 7（2）： 187-199.
27	WANG G， ZHAI Q， LIU H. Cross self-attention network for 3D point cloud［J］. Knowledge-Based Systems， 2022， 247： No.108769.
28	YU X， TANG L， RAO Y， et al. Point-BERT： pre-training 3D point cloud Transformers with masked point modeling［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 19291-19300.
29	CHENG S， CHEN X， HE X， et al. PRA-Net： point relation-aware network for 3D point cloud analysis［J］. IEEE Transactions on Image Processing， 2021， 30： 4436-4448.
30	CHEN Y， YANG Z， ZHENG X， et al. PointFormer： a dual perception attention-based network for point cloud classification［C］// Proceedings of the Asian Conference on Computer Vision， LNCS 13841. Cham： Springer， 2023： 432-449.
31	MAZUR K， LEMPITSKY V. Cloud Transformers： a universal approach to point cloud processing tasks［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10695-10704.
32	ZHOU W， ZHAO Y， XIAO Y， et al. TNPC： Transformer-based network for point cloud classification［J］. Expert Systems with Applications， 2024， 239： No.122438.
33	ZHANG C， WAN H， SHEN X， et al. PatchFormer： an efficient point Transformer with patch attention［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11789-11798.
34	GAO Y， LIU X， LI J， et al. LFT-Net： local feature Transformer network for point clouds analysis［J］. IEEE Transactions on Intelligent Transportation Systems， 2023， 24（2）： 2158-2168.
35	LAI X， LIU J， JIANG L， et al. Stratified Transformer for 3D point cloud segmentation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 8490-8499.
36	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
37	WU K， PENG H， CHEN M， et al. Rethinking and improving relative position encoding for Vision Transformer［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10013-10021.
38	SI C， YU W， ZHOU P， et al. Inception Transformer［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 23495-23509.
39	WU Z， SONG S， KHOSLA A， et al. 3D ShapeNets： a deep representation for volumetric shapes［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1912-1920.
40	UY M A， PHAM Q H， HUA B S， et al. Revisiting point cloud classification： a new benchmark dataset and classification model on real-world data［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1588-1597.
41	VAN DER MAATEN L， HINTON G. Visualizing data using t-SNE［J］. Journal of Machine Learning Research， 2008， 9： 2579-2605.

[1]	Dan WANG, Wenhao ZHANG, Lijuan PENG. Channel estimation of reconfigurable intelligent surface assisted communication system based on deep learning [J]. Journal of Computer Applications, 2025, 45(5): 1613-1618.
[2]	Sijie NIU, Yuliang LIU. Auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation [J]. Journal of Computer Applications, 2025, 45(5): 1410-1414.
[3]	Wenpeng WANG, Yinchang QIN, Wenxuan SHI. Review of unsupervised deep learning methods for industrial defect detection [J]. Journal of Computer Applications, 2025, 45(5): 1658-1670.
[4]	Xueying LI, Kun YANG, Guoqing TU, Shubo LIU. Adversarial sample generation method for time-series data based on local augmentation [J]. Journal of Computer Applications, 2025, 45(5): 1573-1581.
[5]	Yang ZHOU, Hui LI. Remote sensing image building extraction network based on dual promotion of semantic and detailed features [J]. Journal of Computer Applications, 2025, 45(4): 1310-1316.
[6]	Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO. Video anomaly detection for moving foreground regions [J]. Journal of Computer Applications, 2025, 45(4): 1300-1309.
[7]	Yiding WANG, Zehao WANG, Yaoli LI, Shaoqing CAI, Yuan YUAN. Multi-scale 2D-Adaboost microscopic image recognition algorithm of Chinese medicinal materials powder [J]. Journal of Computer Applications, 2025, 45(4): 1325-1332.
[8]	Zhenhua XUE, Qiang LI, Chao HUANG. Vision foundation model-driven pixel-level image anomaly detection method [J]. Journal of Computer Applications, 2025, 45(3): 823-831.
[9]	Ruilong CHEN, Tao HU, Youjun BU, Peng YI, Xianjun HU, Wei QIAO. Stacking ensemble adversarial defense method for encrypted malicious traffic detection model [J]. Journal of Computer Applications, 2025, 45(3): 864-871.
[10]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[11]	Zhongwei ZHANG, Jun WANG, Shudong LIU, Zhiheng WANG. Object detection in remote sensing image based on multi-scale feature fusion and weighted boxes fusion [J]. Journal of Computer Applications, 2025, 45(2): 633-639.
[12]	Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU. Summary of network intrusion detection systems based on deep learning [J]. Journal of Computer Applications, 2025, 45(2): 453-466.
[13]	Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654.
[14]	Danni DING, Bo PENG, Xi WU. VPNet： fatty liver ultrasound image classification method inspired by ventral pathway [J]. Journal of Computer Applications, 2025, 45(2): 662-669.
[15]	Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG. Enterprise ESG indicator prediction model based on richness coordination technology [J]. Journal of Computer Applications, 2025, 45(2): 670-676.