Cross-Attention Multi-Modal Point Cloud Completion Network

doi:10.11772/j.issn.1001-9081.2025020182

Abstract

Abstract: To address the issue of incomplete point clouds caused by occlusion, concavities, illumination, and other factors during object scanning with LiDAR devices, this study proposes a 2D view-guided 3D point cloud completion method. Existing view-guided approaches typically fuse 2D and 3D information either explicitly or implicitly. Considering that the fused features should simultaneously retain global information from 2D data and local details for point cloud refinement, we design a novel network that integrates 2D views with point clouds for completion. The network operates in two stages: feature extraction/encoding and feature decoding/point generation. In the first stage, DGCNN and ResNet are employed to extract features from point clouds and 2D views, respectively. These features are then fused via an Attention mechanism to generate hybrid representations, followed by downsampling to obtain global features. In the second stage, the fused features are decoded using Attention mechanisms and upsampled via transposed convolution to reconstruct complete point clouds. Experimental results demonstrate that our method achieves superior Chamfer Distance (CD) metrics on the ShapeNet-ViPC dataset compared to state-of-the-art single-modal and multi-modal point cloud completion approaches.

Key words: point cloud, point cloud completion, multi-modelity, self-supervised, Cross-Attention, geometry-aware

摘要： 针对当前Lidar等设备对于物体扫描时常会出现遮掩、凹陷、光照等因素造成扫描出来的点云残缺问题，进行了一种通过2D视图引导3D点云补全的研究。现有的视图引导方法基本都是显式或隐式的将视图信息与点云信息融合，考虑到融合的信息应该同时具有2D数据的全局信息以及点云细化的局部信息，对此提出了一种2D视图与点云融合进行补全的网络。网络主要分成两个阶段，特征提取并编码，特征解码并生成点云，在第一阶段中先通过DGCNN和Resnet分别对点云和视图进行特征提取，然后通过Attention机制融合两种信息生成融合特征，对融合特征采取下采样得到全局特征；第二阶段通过Attention机制进行解码，并使用转置卷积进行上采样生成点云。通过实验，该网络得到的Chamfer-Distance指标在ShapeNet-Vipc数据集上优于最新的单模态和多模态点云补全方法。

关键词: 点云, 点云补全, 多模态, 自监督, 交叉注意力, 几何感知

CLC Number:

TP391.4

廖泽鑫张绍兵成苗. 基于交叉注意力的多模态点云补全网络[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025020182.

[1]	Huaze ZHU, Weihao WANG, Mingyu YOU, Hongjun ZHOU. 3D part assembly method based on line drawing segmentation [J]. Journal of Computer Applications, 2026, 46(5): 1545-1550.
[2]	Ying JING, Ran LI, Zhuo JIANG, Ziyang FU, Jingyi DU, Qi LIU, Jihang LIU. SAM Meibomian gland unified dense segmentation method with introduction of automatic prompt encoder [J]. Journal of Computer Applications, 2026, 46(5): 1667-1676.
[3]	Gengxin FAN, Huiyan HAN, Liqun KUANG, Ziyang JIN, Huafeng ZHAO. VU-RED-F： improved CAD model replacement for U-RED single-view point clouds [J]. Journal of Computer Applications, 2026, 46(5): 1534-1544.
[4]	Haihua ZHAO, Yijun HU, Rui TANG, Xian MO. Multimodal recommendation method based on semantic fusion and contrast enhancement [J]. Journal of Computer Applications, 2026, 46(4): 1058-1068.
[5]	Yongbing ZHANG, Lirong YAN, Xiaofen TANG. Progressive dual-stage modality interaction for single-domain generalized object detection [J]. Journal of Computer Applications, 2026, 46(4): 1264-1274.
[6]	Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU. Multimodal fact verification with cross-modal semantic association [J]. Journal of Computer Applications, 2026, 46(4): 1069-1076.
[7]	Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG. Power image retrieval method based on improved Swin Transformer [J]. Journal of Computer Applications, 2026, 46(4): 1334-1343.
[8]	Jing ZHANG, Songhua LIU, Yuanqian ZHU. Time series representation method based on spectral sensing and hierarchical convolution [J]. Journal of Computer Applications, 2026, 46(4): 1124-1130.
[9]	Hu LUO, Mingshu ZHANG. Rumor detection method based on cross-modal attention mechanism and contrastive learning [J]. Journal of Computer Applications, 2026, 46(2): 361-367.
[10]	Xiaowei LA, Lihua HU, Jianhua HU, Xiaoling YAO, Xinbo WANG. Low-overlap point cloud registration network integrating position encoding and overlap masks [J]. Journal of Computer Applications, 2026, 46(2): 536-545.
[11]	Xue WANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG. Review of multi-modal knowledge graph completion methods [J]. Journal of Computer Applications, 2026, 46(2): 341-353.
[12]	Hanyue WEI, Chenjuan GUO, Jieyuan MEI, Jindong TIAN, Peng CHEN, Ronghui XU, Bin YANG. MATCH： multimodal stock prediction framework integrating time-frequency features and hybrid text [J]. Journal of Computer Applications, 2026, 46(2): 427-436.
[13]	Fei WANG, Ye TAO, Jiawang LIU, Wei LI, Xiugong QIN, Ning ZHANG. Bimodal fusion method for constructing spatio-temporal knowledge graph in smart home space [J]. Journal of Computer Applications, 2026, 46(1): 52-59.
[14]	Wen LI, Kairong LI, Kai YANG. Subgraph-aware contrastive learning with data augmentation [J]. Journal of Computer Applications, 2026, 46(1): 1-9.
[15]	Shiwei LI, Yufeng ZHOU, Pengfei SUN, Weisong LIU, Zhuxuan MENG, Haojie LIAN. Point cloud data augmentation method based on scattering and absorption effects of coal dust on LiDAR electromagnetic waves [J]. Journal of Computer Applications, 2026, 46(1): 331-340.

Cross-Attention Multi-Modal Point Cloud Completion Network

基于交叉注意力的多模态点云补全网络

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics