Few-shot recognition method of 3D models based on Transformer

doi:10.11772/j.issn.1001-9081.2022060952

Abstract

Abstract:

Aiming at the classification problems of Three-Dimensional （3D） models， a method of few-shot recognition of 3D models based on Transformer was proposed. Firstly， the 3D point cloud models of the support and query samples were fed into the feature extraction module to obtain feature vectors. Then， the attention features of the support samples were calculated in the Transformer module. Finally， the cosine similarity network was used to calculate the relation scores between the query samples and the support samples. On ModelNet 40 dataset， compared with the Dual-Long Short-Term Memory （Dual-LSTM） method， the proposed method has the recognition accuracy of 5-way 1-shot and 5-way 5-shot increased by 34.54 and 21.00 percentage points， respectively. At the same time， the proposed method also obtains high accuracy on ShapeNet Core dataset. Experimental results show that the proposed method can recognize new categories of 3D models more accurately.

Key words: few-shot recognition, Three-Dimensional （3D) model, attention mechanism, point cloud neural network, meta-learning

摘要：

针对三维模型的分类问题，提出一种基于Transformer的三维（3D）模型小样本识别方法。首先，将支持和查询样本的3D点云模型输入特征提取模块中，以得到特征向量；然后，在Transformer模块中计算支持样本的注意力特征；最后，利用余弦相似性网络，计算查询与支持样本的关系分数。在ModelNet 40数据集上，相较于两层长短期记忆（Dual-LSTM）方法，所提方法的5-way 1-shot和5-way 5-shot的识别准确率分别提高了34.54和21.00个百分点；同时，所提方法在ShapeNet Core数据集上也取得了较高的准确率。实验结果表明，所提方法能够更准确地识别全新的3D模型类别。

关键词: 小样本识别, 三维模型, 注意力机制, 点云神经网络, 元学习

CLC Number:

TP391.41

Hui WANG, Jianhong LI. Few-shot recognition method of 3D models based on Transformer[J]. Journal of Computer Applications, 2023, 43(6): 1750-1758.

王辉, 李建红. 基于Transformer的三维模型小样本识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1750-1758.

Figures/Tables 19

Fig. 1 Framework of few-shot recognition method of 3D models based on Transformer

Fig. 2 Framework of feature extraction module

Fig. 3 Structure of Transformer module

Fig. 4 Visualization of some models in ModelNet 40 dataset

Fig. 5 Visualization of some models in ShapeNet Core dataset

Fig. 6 Point cloud models with different numbers of points

Fig. 7 Prediction accuracy varying with different sampling point numbers

Tab. 1 Accuracy of 1-shot experiments at different sampling point numbers

采样点数	ModelNet 40		ShapeNet Core.v2		ShapeNet Core_normal
采样点数	3-way	5-way	3-way	5-way	3-way	5-way
256	83.28	78.25	80.12	79.06	85.99	80.86
512	86.59	79.06	80.28	79.63	96.05	84.39
1 024	87.37	80.86	81.75	81.51	83.96	82.25
2 048	87.21	81.32	80.63	79.96	78.68	81.01

Fig. 8 Curves of loss changing with the number of iterations at different sampling point numbers

Tab. 2 Accuracies of the proposed method of 1-shot experiments on ModelNet 40 and ShapeNet Core datasets

数据集	3-way	5-way
ModelNet 40	87.37	80.86
ShapeNet Core.v2	81.75	81.51
ShapeNet Core_normal	83.96	82.25

Fig. 9 Curves of loss changing with C value on different datasets

Tab. 3 Accuracies of 5-way K-shot experiments on ModelNet 40 and ShapeNet Core_normal datasets

数据集	K=1	K=2	K=5	K=10
ModelNet 40	80.86	81.25	83.77	84.21
ShapeNet Core_normal	82.25	83.96	85.31	85.76

Tab. 4 Recognition accuracies at different λ values

数据集	$λ = 0$	$λ = 0.000 1$	$λ = 0.01$	$λ = 0.1$	$λ = 1$
ShapeNet Core.v2	79.33	82.32	80.57	79.18	79.62
ShapeNet Core_normal	80.44	85.31	83.75	81.64	81.28
ModelNet 40	78.53	83.77	81.19	80.43	80.01

Tab. 4 Recognition accuracies at different λ values

数据集	$λ = 0$	$λ = 0.000 1$	$λ = 0.01$	$λ = 0.1$	$λ = 1$
ShapeNet Core.v2	79.33	82.32	80.57	79.18	79.62
ShapeNet Core_normal	80.44	85.31	83.75	81.64	81.28
ModelNet 40	78.53	83.77	81.19	80.43	80.01

Tab. 5 Few-shot recognition accuracies of different deep learning methods on ModelNet 40 dataset

方法	5-way		10-way
方法	10-shot	20-shot	10-shot	20-shot
DGCNN+cTree^［40］	60.00	65.70	48.50	53.00
PointNet+cTree^［40］	63.20	68.90	49.20	50.10
PointNet+Jigsaw^［41］	66.50	69.20	56.90	66.50
本文方法	84.21	81.53	80.32	80.75

Tab. 6 Five-way accuracies of different few-shot recognition methods of 3D models on ModelNet 40 dataset

方法	5-way 1-shot	5-way 5-shot
Dual-LSTM^［16］	46.32	62.77
关系网络	70.27	72.13
无Transformer网络	35.56	36.92
本文方法	80.86	83.77

Fig. 10 Relation score matrix

Fig. 11 Experimental results of 3-way 1-shot 1-query

Fig. 12 Experimental results of 5-way 1-shot 1-query

Fig. 13 Example of failure results

References 41

1	WANG Y K， XU C M， LIU C， et al. Instance credibility inference for few-shot learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 12833-12842. 10.1109/cvpr42600.2020.01285
2	赵凯琳，靳小龙，王元卓. 小样本学习研究综述［J］. 软件学报， 2021， 32（2）： 349-369. 10.13328/j.cnki.jos.006138
	ZHAO K L， JIN X L， WANG Y Z. Survey on few-shot learning ［J］. Journal of Software， 2021， 32（2）： 349-369. 10.13328/j.cnki.jos.006138
3	YANG J C， GUO X L， LI Y， et al. A survey of few-shot learning in smart agriculture： developments， applications， and challenges ［J］. Plant Methods， 2022， 18： No.28. 10.1186/s13007-022-00866-2
4	SA L B， YU C C， MA X Q， et al. Attentive fine-grained recognition for cross-domain few-shot classification［J］. Neural Computing and Applications， 2022， 34（6）： 4733-4746. 10.1007/s00521-021-06627-x
5	孙文赟，金忠，赵海涛，等. 基于深度特征增广的跨域小样本人脸欺诈检测算法［J］. 计算机科学， 2021， 48（2）： 330-336. 10.11896/jsjkx.200100020
	SUN W Y， JIN Z， ZHAO H T， et al. Cross-domain few-shot face spoofing detection method based on deep feature augmentation ［J］. Computer Science， 2021， 48（2）： 330-336. 10.11896/jsjkx.200100020
6	SHOME D， KAR T. FedAffect： few-shot federated learning for facial expression recognition［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 4151-4158. 10.1109/iccvw54120.2021.00463
7	尹力，周琪. 基于小样本数据和深度残差网络的月度供电量预测研究［J］. 计算机与数字工程， 2022， 50（2）： 448-452. 10.3969/j.issn.1672-9722.2022.02.042
	YIN L， ZHOU Q. Research on monthly power supply forecasting based on small sample data and deep residual network ［J］. Computer and Digital Engineering， 2022， 50（2）： 448-452. 10.3969/j.issn.1672-9722.2022.02.042
8	董阳，潘海为，崔倩娜，等. 面向多模态磁共振脑瘤图像的小样本分割方法［J］. 计算机应用， 2021， 41（4）： 1049-1054. 10.11772/j.issn.1001-9081.2020081388
	DONG Y， PAN H W， CUI Q N， et al. Few-shot segmentation method for multi-modal magnetic resonance images of brain tumor ［J］. Journal of Computer Applications， 2021， 41（4）： 1049-1054. 10.11772/j.issn.1001-9081.2020081388
9	刘颖，雷研博，范九伦，等. 基于小样本学习的图像分类技术综述［J］. 自动化学报， 2021， 47（2）： 297-315. 10.16383/j.aas.c190720
	LIU Y， LEI Y B， FAN J L， et al. Survey on image classification technology based on small sample learning ［J］. Acta Automatica Sinica， 2021， 47（2）： 297-315. 10.16383/j.aas.c190720
10	MA J W， XIE H C， HAN G X， et al. Partner-assisted learning for few-shot image classification ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10553-10562. 10.1109/iccv48922.2021.01040
11	YANG F Y， WANG R P， CHEN X L. SEGA： semantic guided attention on visual prototype for few-shot learning ［C］// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2022： 1586-1596. 10.1109/wacv51458.2022.00165
12	WERTHEIMER D， TANG L， HARIHARAN B. Few-shot classification with feature map reconstruction networks［C］// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 8012-8021. 10.1109/cvpr46437.2021.00792
13	ADAMKIEWICZ M， CHEN T， CACCAVALE A， et al. Vision-only robot navigation in a neural radiance world［J］. IEEE Robotics and Automation Letters， 2022， 7（2）： 4606-4613. 10.1109/lra.2022.3150497
14	王贺鹏，李志斌，王立. 自动驾驶仿真的虚拟交通信号系统分析及实现［J］. 汽车实用技术， 2020（7）： 34-37.
	WANG H P， LI Z B， WANG L. Analysis and implementation of virtual traffic signal system for autopilot simulation［J］. Automobile Applied Technology， 2020（7）： 34-37.
15	陈涛，丘恩华，孔吉宏，等. 基于CAD的虚拟现实技术在水电站仿真系统的应用［J］. 计算机与数字工程， 2021， 49（4）： 856-861. 10.3969/j.issn.1672-9722.2021.04.047
	CHEN T， QIU E H， KONG J H， et al. Application of virtual reality technology based on CAD in hydropower station simulation system ［J］. Computer and Digital Engineering， 2021， 49（4）： 856-861. 10.3969/j.issn.1672-9722.2021.04.047
16	NIE J， XU N， ZHOU M， et al. 3D model classification based on few-shot learning ［J］. Neurocomputing， 2020， 398： 539-546. 10.1016/j.neucom.2019.03.105
17	WU Z R， SONG S R， KHOSLA A， et al. 3D ShapeNets： a deep representation for volumetric shapes ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1912-1920. 10.1109/cvpr.2015.7298801
18	DENG Y， YANG J L， TONG X. Deformed implicit field： modeling 3D shapes with learned dense correspondence［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10281-10291. 10.1109/cvpr46437.2021.01015
19	QI C R， YI L， SU H， et al. PointNet++： deep hierarchical feature learning on point sets in a metric space ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 5105-5114.
20	LIU B， KANG H， LI H X， et al. Few-shot open-set recognition using meta-learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8795-8804. 10.1109/cvpr42600.2020.00882
21	BAIK S， CHOI M， CHOI J， et al. Meta-learning with adaptive hyperparameters［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 20755-20765.
22	BAIK S， HONG S， LEE K M. Learning to forget for meta-learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2376-2384. 10.1109/cvpr42600.2020.00245
23	BAIK S， CHOI J， KIM H， et al. Meta-learning with task-adaptive loss function for few-shot learning ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9445-9454. 10.1109/iccv48922.2021.00933
24	GIDARIS S， KOMODAKIS N. Dynamic few-shot visual learning without forgetting［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4367-4375. 10.1109/cvpr.2018.00459
25	GIDARIS S， KOMODAKIS N. Generating classification weights with GNN Denoising Autoencoders for few-shot learning［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 21-30. 10.1109/cvpr.2019.00011
26	HARIHARAN B， GIRSHICK R. Low-shot visual recognition by shrinking and hallucinating features ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3037-3046. 10.1109/iccv.2017.328
27	MUNKHDALAI T， YUAN X D， MEHRI S. Rapid adaptation with conditionally shifted neurons［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 3664-3673.
28	SCHWARTZ E， KARLINSKY L， SHTOK J， et al. Δ-encoder： an effective sample synthesis method for few-shot object recognition［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 2850-2860.
29	GARCIA V， BRUNA J. Few-shot learning with graph neural networks［EB/OL］. （2018-02-20）［2022-04-12］..
30	YANG L， LI L L， ZHANG Z L， et al. DPGN： distribution propagation graph network for few-shot learning ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13387-13396. 10.1109/cvpr42600.2020.01340
31	HAN X F， LEUNG T， JIA Y Q， et al. MatchNet： unifying feature and metric learning for patch-based matching ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3279-3286. 10.1109/cvpr.2015.7298948
32	LI W B， WANG L， XU J L， et al. Revisiting local descriptor based image-to-class measure for few-shot learning［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7253-7260. 10.1109/cvpr.2019.00743
33	LI H Y， EIGEN D， DODGE S， et al. Finding task-relevant features for few-shot learning by category traversal［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1-10. 10.1109/cvpr.2019.00009
34	QI C R， SU H， MO K C， et al. PointNet： deep learning on point sets for 3D classification and segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 77-85. 10.1109/cvpr.2017.16
35	REN M Y， TRIANTAFILLOU E， RAVI S， et al. Meta-learning for semi-supervised few-shot classification［EB/OL］. （2018-03-02）［2022-04-12］. .
36	SUNG F， YANG Y X， ZHANG L， et al. Learning to compare： relation network for few-shot learning ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1199-1208. 10.1109/cvpr.2018.00131
37	TANG S X， CHEN D P， BAI L， et al. Mutual CRF-GNN for few-shot learning ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 2329-2339. 10.1109/cvpr46437.2021.00236
38	DOERSCH C， GUPTA A， ZISSERMAN A. CrossTransformers： spatially-aware few-shot transfer［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020：21981-21993.
39	YE H J， HU H X， ZHAN D C， et al. Few-shot learning via embedding adaptation with set-to-set functions ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8805-8814. 10.1109/cvpr42600.2020.00883
40	SHARMA C， KAUL M. Self-supervised few-shot learning on point clouds ［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 7212-7221.
41	SAUDER J， SIEVERS B. Self-supervised deep learning on point clouds by reconstructing space［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 12962-12972.

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[9]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[10]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[11]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[14]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[15]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.