Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion

doi:10.11772/j.issn.1001-9081.2024091254

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 3003-3010.DOI: 10.11772/j.issn.1001-9081.2024091254

• Multimedia computing and computer simulation • Previous Articles

Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion

Weigang LI¹^,², Jiale SHAO¹(), Zhiqiang TIAN²

^1.School of Information Science and Engineering，Wuhan University of Science and Technology，Wuhan Hubei 430081，China
^2.Engineering Research Center for Metallurgical Automation and Measurement Technology of Ministry of Education，Wuhan University of Science and Technology，Wuhan Hubei 430081，China

Received:2024-09-05 Revised:2024-10-16 Accepted:2024-10-18 Online:2024-10-31 Published:2025-09-10
Contact: Jiale SHAO
About author:LI Weigang， born in 1977， Ph. D.， professor. His research interests include industrial process control， artificial intelligence， machine learning.
TIAN Zhiqiang， born in 1996， Ph. D. candidate. His research interests include computer vision.
Supported by:
Hubei Provincial Science and Technology Talent Serving Enterprise Project(202400288)

基于双注意力机制和多尺度融合的点云分类与分割网络

李维刚¹^,², 邵佳乐¹(), 田志强²

^1.武汉科技大学信息科学与工程学院，武汉 430081
^2.武汉科技大学冶金自动化与检测技术教育部工程研究中心，武汉 430081

通讯作者: 邵佳乐
作者简介:李维刚（1977—），男，湖北咸宁人，教授，博士，主要研究方向：工业过程控制、人工智能、机器学习；深度学习、点云数据处理
田志强（1996—），男，湖北武汉人，博士研究生，主要研究方向：计算机视觉。
基金资助:
湖北省科技人才服务企业项目(202400288)

Abstract

Abstract:

The existing networks are difficult to learn local geometric shape information of point clouds effectively， and have problems such as being unable to focus on important feature structure effectively and insufficient fusion. Therefore， a point cloud classification and segmentation network based on Dual Attention Mechanism （DAM） and multi-scale fusion was proposed. Firstly， in the data feature extraction stage， geometric positions and weights of the convolution kernels were adjusted using Geometric Adaptive Convolution （GAC） dynamically， so that it was able to adapt to local geometric structure of the point cloud data dynamically， thereby capturing local features more effectively. Secondly， in order to further improve the feature expression ability， the DAM was introduced to learn and adjust weights of the feature channels and spatial information automatically， thereby enhancing feature representation of the key points. Finally， feature information of different scales was connected for effective fusion to enhance the feature learning effect， thereby making the final feature representation richer and improving classification and segmentation accuracy of the network. Experimental results on ModelNet40， ShapeNet and S3DIS datasets show that the proposed network increases the Overall Accuracy （OA） and mean Intersection over Union （mIoU） compared with PointNet++ and DGCNN （Dynamic Graph Convolutional Neural Network）， improving the performance of point cloud classification and segmentation effectively.

Key words: point cloud, classification and segmentation, deep learning, attention mechanism, feature fusion

摘要：

现有的网络难以有效学习点云局部的几何形状信息，存在无法有效关注重要特征结构和融合不充分等问题。因此，提出一种基于双注意力机制（DAM）和多尺度融合的点云分类与分割网络。首先，在数据特征提取阶段利用几何自适应卷积（GAC）动态地调整卷积核的几何位置和权重，使它能够动态适应点云数据的局部几何结构，从而更有效地捕捉局部特征；其次，为了进一步提升特征表达能力，引入DAM自动学习并调整特征通道和空间信息的权重，从而增强关键点的特征表示；最后，连接不同尺度的特征信息以进行有效融合，从而增强特征学习效果，使得最终的特征表示更加丰富，以提高网络的分类分割精度。在ModelNet40、ShapeNet和S3DIS数据集上的实验结果表明，所提网络与PointNet++和DGCNN（Dynamic Graph Convolutional Neural Network）相比，总体分类精度（OA）和平均交并比（mIoU）更好，有效提升了点云分类与分割的性能。

关键词: 点云, 分类分割, 深度学习, 注意力机制, 特征融合

CLC Number:

TP391.4

Weigang LI, Jiale SHAO, Zhiqiang TIAN. Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion[J]. Journal of Computer Applications, 2025, 45(9): 3003-3010.

李维刚, 邵佳乐, 田志强. 基于双注意力机制和多尺度融合的点云分类与分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3003-3010.

Figures/Tables 12

References 32

[1]	LI Y， MA L， ZHONG Z， et al. Deep learning for LiDAR point clouds in autonomous driving： a review ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（8）： 3412-3432.
[2]	JIA Z， YUAN H， ZHAO X， et al. Single-cell genetic analysis of lung tumor cells based on self-driving micro-cavity array chip ［J］. Talanta， 2021， 226： No.122172.
[3]	FAN T， ZHANG R. Research on automatic lane line extraction method based on onboard lidar point cloud data ［C］// Proceedings of the 2nd International Conference on Digital Signal and Computer Communications. Bellingham， WA： SPIE， 2022： No.123060P.
[4]	李佳男，王泽，许廷发. 基于点云数据的三维目标检测技术研究进展［J］. 光学学报， 2023， 43（15）： No.1515001.
	LI J N， WANG Z， XU T F. Three-dimensional object detection technology based on point cloud data ［J］. Acta Optica Sinica， 2023， 43（15）： No.1515001.
[5]	史怡，魏东，宋强，等. 基于动态图卷积和离散哈特莱转换差异性池化的点云数据分类分割网络［J］. 计算机应用， 2022， 42（S1）：292-297.
	SHI Y， WEI D， SONG Q， et al. Point cloud data classification and segmentation network based on dynamic graph convolution and discrete Hartley transform different pooling［J］. Journal of Computer Applications， 2022， 42（S1）：292-297.
[6]	SCHULT J， ENGELMANN F， HERMANS A， et al. Mask3D： mask Transformer for 3D semantic instance segmentation ［C］// Proceedings of the 2023 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2023： 8216-8223.
[7]	李维刚，陈婷，田志强. 基于孪生自适应图卷积算法的点云分类与分割［J］. 计算机应用， 2023， 43（11）： 3396-3402.
	LI W G， CHEN T， TIAN Z Q. Point cloud classification and segmentation based on Siamese adaptive graph convolution algorithm ［J］. Journal of Computer Applications， 2023， 43（11）： 3396-3402.
[8]	GUO Y， WANG H， HU Q， et al. Deep learning for 3D point clouds： a survey ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（12）： 4338-4364.
[9]	BAI X， ZHOU J， NING X， et al. 3D data computation and visualization ［J］. Displays， 2022， 73： No.102169.
[10]	MATURANA D， SCHERER S. VoxNet： a 3D convolutional neural network for real-time object recognition ［C］// Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway： IEEE， 2015： 922-928.
[11]	SU H， MAJI S， KALOGERAKIS E， et al. Multi-view convolutional neural networks for 3D shape recognition ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 945-953.
[12]	QI C R， SU H， MO K， et al. PointNet： deep learning on point sets for 3D classification and segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 77-85.
[13]	QI C R， YI L， SU H， et al. PointNet++： deep hierarchical feature learning on point sets in a metric space ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 5105-5114.
[14]	LI Y， BU R， SUN M， et al. PointCNN： convolution on X-transformed points ［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 828-838.
[15]	ZHAO H， JIANG L， FU C W， et al. PointWeb： enhancing local neighborhood features for point cloud processing ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE： 5560-5568.
[16]	WANG Y， SUN Y， LIU Z， et al. Dynamic graph CNN for learning on point clouds ［J］. ACM Transactions on Graphics， 2019， 38（5）： No.146.
[17]	ZHANG K， HAO M， WANG J， et al. Linked dynamic graph CNN： Learning through point cloud by linking hierarchical features［C］// Proceedings of the 27th International Conference on Mechatronics and Machine Vision in Practice. Piscataway： IEEE， 2021： 7-12.
[18]	WU W， QI Z， FUXIN L. PointConv： deep convolutional networks on 3D point clouds ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9613-9622.
[19]	ZHOU L， LIU Y， ZHANG P， et al. Information bottleneck and selective noise supervision for zero-shot learning ［J］. Machine Learning， 2023， 112（7）： 2239-2261.
[20]	ZHAO H， JIANG L， JIA J， et al. Point Transformer ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 16239-16248.
[21]	REN D， WU Z， LI J， et al. Point attention network for point cloud semantic segmentation ［J］. SCIENCE CHINA Information Sciences， 2022， 65（9）： No.192104.
[22]	CHEN C， WANG Y， CHEN H， et al. GeoSegNet： point cloud semantic segmentation via geometric encoder-decoder modeling［J］. The Visual Computer， 2024， 40（8）： 5107-5121.
[23]	WU C， ZHENG J， PFROMMER J， et al. Attention-based point cloud edge sampling ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 5333-5343.
[24]	TIAN Z， LI W， HU J， et al. Joint graph entropy knowledge distillation for point cloud classification and robustness against corruptions ［J］. Information Sciences， 2023， 648： No.119542.
[25]	ZHOU W， ZHAO Y， XIAO Y， et al. TNPC： Transformer-based network for point cloud classification ［J］. Expert Systems with Applications， 2024， 239： No.122438.
[26]	于丽丽，于海洋，何子鑫，等. 基于双注意力机制和多尺度特征的点云场景分割［J］. 激光与光电子学进展， 2021， 58（24）： No.428007.
	YU L L， YU H Y， HE Z X， et al. Point cloud scene segmentation based on dual attention mechanism and multi-scale features ［J］. Laser and Optoelectronics Progress， 2021， 58（24）： No.428007.
[27]	ATZMON M， MARON H， LIPMAN Y. Point convolutional neural networks by extension operators ［J］. ACM Transactions on Graphics， 2018， 37（4）： No.71.
[28]	GUO M H， CAI J X， LIU Z N， et al. PCT： point cloud Transformer ［J］. Computational Visual Media， 2021， 7（2）： 187-199.
[29]	YAN X， ZHENG C， LI Z， et al. PointASNL： robust point clouds processing using nonlocal neural networks with adaptive sampling［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5588-5597.
[30]	WU Z， SONG S， KHOSLA A， et al. 3D ShapeNets： a deep representation for volumetric shapes ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1912-1920.
[31]	YI L， KIM V G， CEYLAN D， et al. A scalable active framework for region annotation in 3D shape collections ［J］. ACM Transactions on Graphics， 2016， 35（6）： No.210.
[32]	ARMENI I， SENER O， ZAMIR A R， et al. 3D semantic parsing of large-scale indoor spaces ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1534-1543.

方法	输入	mAcc/%	OA/%	运算量/GFLOPs
VoxNet	voxel	83.0	85.9	—
MVCNN	image	—	90.1	—
PointNet	point	86.2	89.2	0.440
PointNet++	point	88.3	90.7	0.870
DGCNN	point	90.2	92.9	2.450
PCNN	point	88.1	92.2	—
PointConv	point	—	92.5	—
PCT	point	—	93.2	2.320
PointWeb	point	89.4	92.3	—
Point Transformer	point	—	92.8	—
JGEKD	point	90.9	93.4	—
本文方法	point	91.3	93.5	2.204

方法	输入	mAcc/%	OA/%	运算量/GFLOPs
VoxNet	voxel	83.0	85.9	—
MVCNN	image	—	90.1	—
PointNet	point	86.2	89.2	0.440
PointNet++	point	88.3	90.7	0.870
DGCNN	point	90.2	92.9	2.450
PCNN	point	88.1	92.2	—
PointConv	point	—	92.5	—
PCT	point	—	93.2	2.320
PointWeb	point	89.4	92.3	—
Point Transformer	point	—	92.8	—
JGEKD	point	90.9	93.4	—
本文方法	point	91.3	93.5	2.204

方法	不同类别的IoU																Cls.mIoU	mIoU
方法	飞机	包	帽子	车	椅子	耳机	吉他	刀	灯	电脑	摩托	杯子	手枪	火箭	滑板	桌子	Cls.mIoU	mIoU
PointNet	83.4	78.7	82.5	74.9	89.6	73.0	91.5	85.9	80.8	95.3	65.2	93.0	81.2	57.9	72.8	80.6	80.4	83.7
PointNet++	82.4	79.0	87.7	77.3	90.8	71.8	91.0	85.9	83.7	95.3	71.6	94.1	81.3	58.7	76.4	82.6	81.9	85.1
DGCNN	84.0	83.4	86.7	77.8	90.6	74.7	91.2	87.5	82.8	95.7	66.3	94.9	81.1	63.5	74.5	82.6	82.3	85.2
LDGCNN	84.0	83.0	84.9	78.4	90.6	74.4	91.0	88.1	83.4	95.8	67.4	94.9	82.3	59.2	76.0	81.9	82.2	85.1
PCNN	82.4	80.1	85.5	79.5	90.8	73.2	91.3	86.0	85.0	95.7	73.2	94.8	83.3	51.0	75.0	81.8	81.8	85.1
PointASNL	84.1	84.7	87.9	79.7	92.2	73.7	91.0	87.2	84.2	95.8	74.4	95.2	81.0	63.0	76.3	83.2	83.3	86.1
本文方法	83.3	85.3	90.4	77.6	90.8	78.1	91.4	88.0	85.1	95.9	72.5	95.3	82.2	63.7	77.8	83.1	83.8	86.4

方法	不同类别的IoU																Cls.mIoU	mIoU
方法	飞机	包	帽子	车	椅子	耳机	吉他	刀	灯	电脑	摩托	杯子	手枪	火箭	滑板	桌子	Cls.mIoU	mIoU
PointNet	83.4	78.7	82.5	74.9	89.6	73.0	91.5	85.9	80.8	95.3	65.2	93.0	81.2	57.9	72.8	80.6	80.4	83.7
PointNet++	82.4	79.0	87.7	77.3	90.8	71.8	91.0	85.9	83.7	95.3	71.6	94.1	81.3	58.7	76.4	82.6	81.9	85.1
DGCNN	84.0	83.4	86.7	77.8	90.6	74.7	91.2	87.5	82.8	95.7	66.3	94.9	81.1	63.5	74.5	82.6	82.3	85.2
LDGCNN	84.0	83.0	84.9	78.4	90.6	74.4	91.0	88.1	83.4	95.8	67.4	94.9	82.3	59.2	76.0	81.9	82.2	85.1
PCNN	82.4	80.1	85.5	79.5	90.8	73.2	91.3	86.0	85.0	95.7	73.2	94.8	83.3	51.0	75.0	81.8	81.8	85.1
PointASNL	84.1	84.7	87.9	79.7	92.2	73.7	91.0	87.2	84.2	95.8	74.4	95.2	81.0	63.0	76.3	83.2	83.3	86.1
本文方法	83.3	85.3	90.4	77.6	90.8	78.1	91.4	88.0	85.1	95.9	72.5	95.3	82.2	63.7	77.8	83.1	83.8	86.4

方法	mIoU	mAcc
PointNet	41.1	48.9
PointNet++	50.6	—
DGCNN	47.0	—
PCNN	57.3	63.9
本文方法	59.1	65.3

Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion

基于双注意力机制和多尺度融合的点云分类与分割网络

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 32

Related Articles 15

Recommended Articles

Metrics

[1]	Xiang WANG, Zhixiang CHEN, Guojun MAO. Multivariate time series prediction method combining local and global correlation [J]. Journal of Computer Applications, 2025, 45(9): 2806-2816.
[2]	Panfeng JING, Yudong LIANG, Chaowei LI, Junru GUO, Jinyu GUO. Semi-supervised image dehazing algorithm based on teacher-student learning [J]. Journal of Computer Applications, 2025, 45(9): 2975-2983.
[3]	Yiming LIANG, Jing FAN, Wenze CHAI. Multi-scale feature fusion sentiment classification based on bidirectional cross attention [J]. Journal of Computer Applications, 2025, 45(9): 2773-2782.
[4]	Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO. Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition [J]. Journal of Computer Applications, 2025, 45(9): 2838-2847.
[5]	Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer [J]. Journal of Computer Applications, 2025, 45(9): 2949-2956.
[6]	Bing YIN, Zhenhua LING, Yin LIN, Changfeng XI, Ying LIU. Emotion recognition method compatible with missing modal reasoning [J]. Journal of Computer Applications, 2025, 45(9): 2764-2772.
[7]	Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU. Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2957-2965.
[8]	Chengzhi YAN, Ying CHEN, Kai ZHONG, Han GAO. 3D object detection algorithm based on multi-scale network and axial attention [J]. Journal of Computer Applications, 2025, 45(8): 2537-2545.
[9]	Yanhua LIAO, Yuanxia YAN, Wenlin PAN. Multi-target detection algorithm for traffic intersection images based on YOLOv9 [J]. Journal of Computer Applications, 2025, 45(8): 2555-2565.
[10]	Haifeng WU, Liqing TAO, Yusheng CHENG. Partial label regression algorithm integrating feature attention and residual connection [J]. Journal of Computer Applications, 2025, 45(8): 2530-2536.
[11]	Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG. Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU [J]. Journal of Computer Applications, 2025, 45(8): 2515-2521.
[12]	Chao JING, Yutao QUAN, Yan CHEN. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm [J]. Journal of Computer Applications, 2025, 45(8): 2646-2655.
[13]	Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG. Dynamic detection method of eclipse attacks for blockchain node analysis [J]. Journal of Computer Applications, 2025, 45(8): 2428-2436.
[14]	Jinhao LIN, Chuan LUO, Tianrui LI, Hongmei CHEN. Thoracic disease classification method based on cross-scale attention network [J]. Journal of Computer Applications, 2025, 45(8): 2712-2719.
[15]	Jin ZHOU, Yuzhi LI, Xu ZHANG, Shuo GAO, Li ZHANG, Jiachuan SHENG. Modulation recognition network for complex electromagnetic environments [J]. Journal of Computer Applications, 2025, 45(8): 2672-2682.

权重矩阵数	OA/%	权重矩阵数	OA/%
2	92.2	8	93.4
4	92.4	16	93.0

权重矩阵数	OA/%	权重矩阵数	OA/%
2	92.2	8	93.4
4	92.4	16	93.0