Overview of deep metric learning

doi:10.11772/j.issn.1001-9081.2023101415

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 2995-3010.DOI: 10.11772/j.issn.1001-9081.2023101415

• Artificial intelligence • Previous Articles Next Articles

Overview of deep metric learning

Wenze CHAI¹^,²^,³, Jing FAN¹^,²^,³(), Shukui SUN¹^,²^,³, Yiming LIANG¹^,²^,³, Jingfeng LIU¹^,²^,³

^1.School of Electrical and Information Technology，Yunnan Minzu University，Kunming Yunnan 650504，China
^2.Yunnan Key Laboratory of Unmanned Autonomous System （Yunnan Minzu University），Kunming Yunnan 650504，China
^3.Key Laboratory of Information and Communication Security and Disaster Recovery in Universities of Yunnan Province （Yunnan Minzu University），Kunming Yunnan 650504，China

Received:2023-10-19 Revised:2024-02-05 Accepted:2024-02-06 Online:2024-10-15 Published:2024-10-10
Contact: Jing FAN
About author:CHAI Wenze， born in 1998， M. S. candidate. His research interests include deep learning， image classification.
SUN Shukui， born in 1996， M. S. candidate. His research interests include computer vision.
LIANG Yiming， born in 1997， M. S. candidate. His research interests include natural language processing， sentiment analysis.
LIU Jingfeng， born in 1999， M. S. candidate. His research interests include deep learning， image segmentation.
Supported by:
National Natural Science Foundation of China(61540063);Youth Foundation of Humanities and Social Sciences of Ministry of Education(20YJCZH129);Project of Wu Zhonghai Expert Workstation(202305AF150045);Scientific Research Foundation of Education Department of Yunnan Province(2023Y0499);Yunnan Minzu University Master’s Research and Innovation Fund(2022SKY004)

深度度量学习综述

柴汶泽¹^,²^,³, 范菁¹^,²^,³(), 孙书魁¹^,²^,³, 梁一鸣¹^,²^,³, 刘竟锋¹^,²^,³

^1.云南民族大学电气信息工程学院，昆明 650504
^2.云南省无人自主系统重点实验室（云南民族大学），昆明 650504
^3.云南省高校信息与通信安全灾备重点实验室（云南民族大学），昆明 650504

通讯作者: 范菁
作者简介:柴汶泽（1998—），男，山西朔州人，硕士研究生，CCF会员，主要研究方向：深度学习、图像分类
范菁（1976—），女（傣族），云南西双版纳人，教授，博士，CCF会员，主要研究方向：机器学习、模式识别、物联网 fanjing9476@ymu.edu.cn
孙书魁（1996—），男，河南平舆人，硕士研究生，CCF会员，主要研究方向：计算机视觉
梁一鸣（1997—），男，河南商丘人，硕士研究生，CCF会员，主要研究方向：自然语言处理、情感分析
刘竟锋（1999—），男，湖南衡阳人，硕士研究生，主要研究方向：深度学习、图像分割。
基金资助:
国家自然科学基金资助项目(61540063);教育部人文社会科学研究青年基金资助项目(20YJCZH129);云南省吴中海专家工作站项目(202305AF150045);云南省教育厅科学研究基金资助项目(2023Y0499);云南民族大学硕士研究生科研创新基金资助项目(2022SKY004)

Abstract

Abstract:

With the rise of deep neural network， Deep Metric Learning （DML） has attracted widespread attention. To gain a deeper understanding of deep metric learning， firstly， the limitations of traditional metric learning methods were organized and analyzed. Secondly， DML was discussed from three types， including types based on sample pairs， proxies， and classification. Divergence methods， ranking methods and methods based on Generative Adversarial Network （GAN） were introduced in detail of the type based on sample pairs. Proxy-based types was mainly discussed in terms of proxy samples and categories. Cross-modal metric learning， intra-class and inter-class margin problems， hypergraph classification， and combinations with other methods （such as reinforcement learning-based and adversarial learning-based methods） were discussed in the classification-based type. Thirdly， various metrics for evaluating the performance of DML were introduced， and the applications of DML in different tasks， including face recognition， image retrieval， and person re-identification， were summarized and compared. Finally， the challenges faced by DML were discussed and some possible solution strategies were proposed.

Key words: deep neural network, Deep Metric Learning (DML), machine learning, computer vision, artificial intelligence

摘要：

随着深度神经网络的兴起，深度度量学习（DML）引起广泛的关注。为了深入了解深度度量学习，首先，整理和分析传统度量学习方法的局限性。其次，从3个类型探讨DML，包括基于样本对、代理和分类的类型：基于样本对的类型包括散度方法、排序方法和基于生成对抗网络（GAN）的方法；基于代理的类型主要从代理样本、类别方面进行讨论；基于分类的类型中主要讨论了跨模态度量学习、类内类间边距问题、超图分类，以及与其他方法（如基于强化学习和基于对抗学习的方法）的结合。再次，介绍评估DML性能的各种指标，同时总结和对比DML在不同任务（包括人脸识别、图像检索和行人重识别等）中的应用。最后，探讨DML面临的挑战，并提出一些可能的解决策略。

关键词: 深度神经网络, 深度度量学习, 机器学习, 计算机视觉, 人工智能

CLC Number:

TP181

Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning[J]. Journal of Computer Applications, 2024, 44(10): 2995-3010.

柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010.

Figures/Tables 18

References 119

1	KAYA M， BİLGE H Ş. Deep metric learning： a survey［J］. Symmetry， 2019， 11（9）： No.1066.
2	YU B， TAO D. Deep metric learning with tuplet margin loss［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6489-6498.
3	LIU H， CHENG J， WANG W， et al. The general pair-based weighting loss for deep metric learning［EB/OL］. （2019-05-30）［2023-10-15］. .
4	WANG X， HAN X， HUANG W， et al. Multi-similarity loss with general pair weighting for deep metric learning［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5017-5025.
5	BOUDIAF M， RONY J， ZIKO I M， et al. A unifying mutual information view of metric learning： cross-entropy vs. pairwise losses［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12351. Cham： Springer， 2020： 548-564.
6	YAN M， LI N. Borderline-margin loss based deep metric learning framework for imbalanced data［J］. Applied Intelligence， 2023， 53（2）： 1487-1504.
7	SCHROFF F， KALENICHENKO D， PHILBIN J. FaceNet： a unified embedding for face recognition and clustering［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 815-823.
8	HERMANS A， BEYER L， LEIBE B. In defense of the triplet loss for person re-identification［EB/OL］. （2017-11-21）［2023-10-15］. .
9	XUAN H， STYLIANOU A， LIU X， et al. Hard negative examples are hard， but useful［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12359. Cham： Springer， 2020： 126-142.
10	WU C Y， MANMATHA R， SMOLA A J， et al. Sampling matters in deep embedding learning［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2859-2867.
11	YU R， DOU Z， BAI S， et al. Hard-aware point-to-set deep metric for person re-identification［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11220. Cham： Springer， 2018： 196-212.
12	HADSELL R， CHOPRA S， LeCUN Y. Dimensionality reduction by learning an invariant mapping［C］// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition — Volume 2. Piscataway： IEEE， 2006： 1735-1742.
13	SOHN K. Improved deep metric learning with multi-class N-pair loss objective［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 1857-1865.
14	LIN Y， ZHENG L， ZHENG Z， et al. Improving person re-identification by attribute and identity learning［J］. Pattern Recognition， 2019， 95： 151-161.
15	LI J， SELVARAJU R R， GOTMARE A D， et al. Align before fuse： vision and language representation learning with momentum distillation［C］// Proceedings of the 35th Conference on Neural Information Processing Systems. New York： ACM， 2024： 9694-9705.
16	SHMELKOV K， SCHMID C， ALAHARI K. Incremental learning of object detectors without catastrophic forgetting［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3420-3429.
17	ZHANG Y， YANG J， TAN Z， et al. RelationMatch： matching in-batch relationships for semi-supervised learning［EB/OL］. （2023-05-30）［2023-10-15］. .
18	WU C， PFROMMER J， ZHOU M， et al. Generative-contrastive learning for self-supervised latent representations of 3D shapes from multi-modal Euclidean input［EB/OL］. （2023-01-11）［2023-10-15］..
19	WEN Y， ZHANG K， LI Z， et al. A discriminative feature learning approach for deep face recognition［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9911. Cham： Springer， 2016： 499-515.
20	HE X， ZHOU Y， ZHOU Z， et al. Triplet-center loss for multi-view 3D object retrieval［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1945-1954.
21	YU J， FENG Y. Quadruplet-center loss for face verification［C］// Proceedings of the 2019 Chinese Automation Congress. Piscataway： IEEE， 2019： 5034-5039.
22	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017， 60（6）： 84-90.
23	XU X， HE L， LU H， et al. Deep adversarial metric learning for cross-modal retrieval［J］. World Wide Web， 2019， 22（2）： 657-672.
24	QIU Z， PAN Y， YAO T， et al. Deep semantic hashing with generative adversarial networks［EB/OL］. （2018-04-23）［2023-10-15］..
25	BOUTROS F， DAMER N， KIRCHBUCHNER F， et al. ElasticFace： elastic margin loss for deep face recognition［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2022： 1577-1586.
26	WANG N， ZHOU W， TIAN Q， et al. Multi-cue correlation filters for robust visual tracking［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4844-4853.
27	SONG H O， XIANG Y， JEGELKA S， et al. Deep metric learning via lifted structured feature embedding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4004-4012.
28	SUN Y， CHENG C， ZHANG Y， et al. Circle loss： a unified perspective of pair similarity optimization［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6397-6406.
29	WANG X， HUA Y， KODIROV E， et al. Ranked list loss for deep metric learning［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5202-5211.
30	CAKIR F， HE K， XIA X， et al. Deep metric learning to rank［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1861-1870.
31	FU Z， MAO Z， YAN C， et al. Self-supervised synthesis ranking for deep metric learning［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（7）： 4736-4750.
32	DUAN Y， ZHENG W， LIN X， et al. Deep adversarial metric learning［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2780-2789.
33	JACOB P， PICARD D， HISTACE A. Improving deep metric learning with virtual classes and examples mining［C］// Proceedings of the 2022 IEEE International Conference on Image Processing. Piscataway： IEEE， 2022： 2696-2700.
34	HUANG W， ZHANG S， ZHANG P， et al. Identity-aware facial expression recognition via deep metric learning based on synthesized images［J］. IEEE Transactions on Multimedia， 2022， 24： 3327-3339.
35	DENG X， WU W， WANG F. Deep metric learning for text data based on triplet network［J］. IOP Conference Series： Materials Science and Engineering， 2020， 806： No.012038.
36	MATTIES M A. Vector embeddings with subvector permutation invariance using a triplet enhanced autoencoder［EB/OL］. （2020-10-18）［2023-10-15］..
37	WANG J， WANG K C， LAW M T， et al. Centroid-based deep metric learning for speaker recognition［C］// Proceedings of the 2019 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2019： 3652-3656.
38	MA Y， HE Y， ZHANG A， et al. CrossCBR： cross-view contrastive learning for bundle recommendation［C］// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2022： 1233-1241.
39	SABERI-MOVAHED F， EBRAHIMPOUR M K， SABERI-MOVAHED F， et al. Deep metric learning with soft orthogonal proxies［EB/OL］. （2023-06-22）［2023-10-15］..
40	GU G， KO B， KIM H G. Proxy Synthesis： learning with synthetic classes for deep metric learning［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 1460-1468.
41	ZHAI A， WU H Y. Classification is a strong baseline for deep metric learning［C］// Proceedings of the 2019 British Machine Vision Conference. Durham： BMVA Press， 2019： No.1206.
42	ZHAO X， QI H， LUO R， et al. A weakly supervised adaptive triplet loss for deep metric learning［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2019： 3177-3180.
43	FEHERVARI I， MACEDO I. Adaptive additive classification-based loss for deep metric learning［EB/OL］. （2020-06-25）［2023-10-15］..
44	KIM M， GUERRERO R， PHAM H X， et al. Variational continual proxy-anchor for deep metric learning［C］// Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2022： 4552-4573.
45	PORZI L， HOFINGER M， RUIZ I， et al. Learning multi-object tracking and segmentation from automatic annotations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6845-6854.
46	ZHANG W， OUYANG W， LI W， et al. Collaborative and adversarial network for unsupervised domain adaptation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3801-3809.
47	BENGAR J Z， VAN DE WEIJER J， FUENTES L L， et al. Class-balanced active learning for image classification［C］// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2022： 3708-3716.
48	FINN C， ABBEEL P， LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1126-1135.
49	SAEKI S， KAWAHARA M， AMAN H. Multi proxy anchor family loss for several types of gradients［J］. Computer Vision and Image Understanding， 2023， 229： No.103654.
50	LIU X， ZHANG S. Domain adaptive person re-identification via coupling optimization［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 547-555.
51	XIAO B， LIU C L， HSAIO W H. Proxy network for few shot learning［C］// Proceedings of the 12th Asian Conference on Machine Learning. New York： JMLR.org， 2020： 657-672.
52	VERMA V K， ARORA G， MISHRA A， et al. Generalized zero-shot learning via synthesized examples［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4281-4289.
53	MOVSHOVITZ-ATTIAS Y， TOSHEV A， LEUNG T K， et al. No fuss distance metric learning using proxies［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 360-368.
54	LI S， GAO P， TAN X， et al. ProxyFormer： proxy alignment assisted point cloud completion with missing part sensitive transformer［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 9466-9475.
55	WU H， SHEN F， ZHU J， et al. A sample-proxy dual triplet loss function for object re-identification［J］. IET Image Processing， 2022， 16（14）： 3781-3789.
56	ROTH K， VINYALS O， AKATA Z. Non-isotropy regularization for proxy-based deep metric learning［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 7410-7420.
57	CHEN T， KORNBLITH S， SWERSKY K， et al. Big self-supervised models are strong semi-supervised learners［C］// Proceedings of the 34th Conference on Neural Information Processing Systems. New York： ACM， 2020： 22243-22255.
58	KIM S， KIM D， CHO M， et al. Proxy anchor loss for deep metric learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3235-3244.
59	CHO J， KANG S， HYUN D， et al. Unsupervised proxy selection for session-based recommender systems［C］// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 327-336.
60	COLEMAN C， YEH C， MUSSMANN S， et al. Selection via proxy： efficient data selection for deep learning［EB/OL］. （2020-10-27）［2023-10-15］..
61	LIU Z， LI J， SHEN Z， et al. Learning efficient convolutional networks through network slimming［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2755-2763.
62	LIZNERSKI P， RUFF L， VANDERMEULEN R A， et al. Explainable deep one-class classification［EB/OL］. （2021-03-18）［2023-10-15］..
63	NA B， MOK J， CHOE H， et al. Accelerating neural architecture search via proxy data［C］// Proceedings of the 30th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2021： 2848-2854.
64	ZHENG S， SONG Y， LEUNG T， et al. Improving the robustness of deep neural networks via stability training［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4480-4488.
65	POMĚNKOVÁ J， MALACH T. Optimized classifier learning for face recognition performance boost in security and surveillance applications［J］. Sensors， 2023， 23（15）： No.7012.
66	ZHANG M， JIANG S， CUI Z， et al. D-VAE： a variational autoencoder for directed acyclic graphs［C］// Proceedings of the 33rd Conference on Neural Information Processing Systems. New York： ACM， 2019： 1588-1600.
67	TSCHANTZ M C. What is proxy discrimination？［C］// Proceedings of the 2022 ACM Conference on Fairness， Accountability， and Transparency. New York： ACM， 2022： 1993-2003.
68	TADMOR O， ROSENWEIN T， SHALEV-SHWARTZ S， et al. Learning a metric embedding for face recognition using the multibatch method［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 1396-1397.
69	WANG C， JIANG Z， YIN Y， et al. Controlling class layout for deep ordinal classification via constrained proxies learning［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 2483-2491.
70	FURUSAWA T. Mean field theory in deep metric learning［EB/OL］. （2023-06-27）［2023-10-15］..
71	YAO X， SHE D， ZHANG H， et al. Adaptive deep metric learning for affective image retrieval and classification［J］. IEEE Transactions on Multimedia， 2021， 23： 1640-1653.
72	HUA Y， YANG Y， DU J. Deep multi-modal metric learning with multi-scale correlation for image-text retrieval［J］. Electronics， 2020， 9（3）： No.466.
73	MEI X， LIU X， SUN J， et al. On metric learning for audio-text cross-modal retrieval［C］// Proceedings of the INTERSPEECH 2022. ［S.l.］： International Speech Communication Association， 2022： 4142-4146.
74	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
75	MOCANU B， TAPU R， ZAHARIA T. Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning［J］. Image and Vision Computing， 2023， 133： No.104676.
76	HUANG Z， SUN Y， HAN C， et al. Modality-aware triplet hard mining for zero-shot sketch-based image retrieval［EB/OL］. （2021-12-16）［2023-10-15］..
77	WANG J， SONG Y， LEUNG T， et al. Learning fine-grained image similarity with deep ranking［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1386-1393.
78	MA C， SUN H， ZHU J， et al. Normalized maximal margin loss for open-set image classification［J］. IEEE Access， 2021， 9： 54276-54285.
79	ZHU Y， YANG M， DENG C， et al. Fewer is more： a deep graph metric learning perspective using fewer proxies［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 17792-17803.
80	ZHENG W， ZHANG B， LU J， et al. Deep relational metric learning［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 12045-12054.
81	KO B， GU G， KIM H G. Learning with memory-based virtual classes for deep metric learning［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 11772-11781.
82	LIM J， YUN S， PARK S， et al. Hypergraph-induced semantic tuplet loss for deep metric learning［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 212-222.
83	CAI B， XIONG P， TIAN S. Center contrastive loss for metric learning［EB/OL］. （2023-08-01）［2023-10-15］..
84	ZHAO Y， ZHONG Z， YANG F， et al. Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 6273-6282.
85	ZHOU K， YANG Y， CAVALLARO A， et al. Learning generalisable omni-scale representations for person re-identification［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（9）： 5056-5069.
86	LIAO S， SHAO L. Graph sampling based deep metric learning for generalizable person re-identification［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 7349-7358.
87	CHEN K， GONG T， ZHANG L. Multi-scale query-adaptive convolution for generalizable person re-identification［C］// Proceedings of the 2023 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2023： 2411-2416.
88	DESPLANQUES B， THIENPONDT J， DEMUYNCK K. ECAPA-TDNN： emphasized channel attention， propagation and aggregation in TDNN based speaker verification［C］// Proceedings of the INTERSPEECH 2020. ［S.l.］： International Speech Communication Association， 2020： 3830-3834.
89	ZHU Y， MAK B. Bayesian self-attentive speaker embeddings for text-independent speaker verification［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2023， 31： 1000-1012.
90	WANG F， XIANG X， CHENG J， et al. NormFace： L₂ hypersphere embedding for face verification［C］// Proceedings of the 25th ACM International Conference on Multimedia. New York： ACM， 2017： 1041-1049.
91	LIU W， WEN Y， YU Z， et al. Large-margin softmax loss for convolutional neural networks［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： JMLR.org， 2016： 507-516.
92	LIU W， WEN Y， YU Z， et al. SphereFace： deep hypersphere embedding for face recognition［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6738-6746.
93	QIAN Q， SHANG L， SUN B， et al. SoftTriple loss： deep metric learning without triplet sampling［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6449-6457.
94	VAN DEN OORD A， LI Y， VINYALS O. Representation learning with contrastive predictive coding［EB/OL］. （2019-01-22）［2023-10-15］..
95	WANG X， ZHANG H， HUANG W， et al. Cross-batch memory for embedding learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6387-6396.
96	WANG H， WANG Y， ZHOU Z， et al. CosFace： large margin cosine loss for deep face recognition［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5265-5274.
97	FENG Y， YOU H， ZHANG Z， et al. Hypergraph neural networks［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2019： 3558-3565.
98	CUBUK E D， ZOPH B， MANÉ D， et al. Autoaugment： learning augmentation strategies from data［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 113-123.
99	HATAYA R， ZDENEK J， YOSHIZOE K， et al. Meta approach to data augmentation optimization［C］// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2022： 3535-3544.
100	TAKASE T， KARAKIDA R， ASOH H. Self-paced data augmentation for training neural networks［J］. Neurocomputing， 2021， 442： 296-306.
101	TANG Z， PENG X， LI T， et al. AdaTransform： adaptive data transformation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 2998-3006.
102	ZHE X， CHEN S， YAN H. Directional statistics-based deep metric learning for image classification and retrieval［J］. Pattern Recognition， 2019， 93： 113-123.
103	HU J， LU J， TAN Y P. Discriminative deep metric learning for face verification in the wild［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1875-1882.
104	LU J， HU J， TAN Y P. Discriminative deep metric learning for face and kinship verification［J］. IEEE Transactions on Image Processing， 2017， 26（9）： 4269-4282.
105	GOLWALKAR R， MEHENDALE N. Masked-face recognition using deep metric learning and FaceMaskNet-21［J］. Applied Intelligence， 2022， 52（11）： 13268-13279.
106	YUCER S， AKGUL Y S. 3D human action recognition with Siamese-LSTM based deep metric learning［J］. Journal of Image and Graphics， 2018， 6（1）： 21-26.
107	GUTOSKI M， LAZZARETTI A E， LOPES H S. Deep metric learning for open-set human action recognition in videos［J］. Neural Computing and Applications， 2021， 33（4）： 1207-1220.
108	SUN Y， ZHU Y， ZHANG Y， et al. Dynamic metric learning： towards a scalable metric space to accommodate multiple semantic scales［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5389-5398.
109	WU S， GONG X. BoundaryFace： a mining framework with noise label self-correction for face recognition［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13673. Cham： Springer， 2022： 91-106.
110	LI S， XIA X， GE S， et al. Selective-supervised contrastive learning with noisy labels［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 316-325.
111	ZHENG W， HUANG Y， ZHANG B， et al. Dynamic metric learning with cross-level concept distillation［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13684. Cham： Springer， 2022： 197-213.
112	RO Y， CHOI J Y. Heterogeneous double-head ensemble for deep metric learning［J］. IEEE Access， 2020， 8： 118525-118533.
113	LIU C， YU H， LI B， et al. Noise-resistant deep metric learning with ranking-based instance selection［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 6807-6816.
114	BURIS L H， PEDRONETTE D C G， PAPA J P， et al. Mixup-based deep metric learning approaches for incomplete supervision［C］// Proceedings of the 2022 IEEE International Conference on Image Processing. Piscataway： IEEE， 2022： 2581-2585.
115	MEYER B J， HARWOOD B， DRUMMOND T. Deep metric learning and image classification with nearest neighbour Gaussian kernels［C］// Proceedings of the 2018 IEEE International Conference on Image Processing. Piscataway： IEEE， 2018： 151-155.
116	ZHANG H， CISSE M， DAUPHIN Y N， et al. mixup： beyond empirical risk minimization［EB/OL］. （2018-04-27）［2023-10-15］..
117	ZHANG D， LI Y， ZHANG Z. Deep metric learning with spherical embedding［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 18772-18783.
118	李子龙，周勇，鲍蓉，等. 优化三元组损失的深度距离度量学习方法［J］.计算机应用， 2021， 41（12）：3480-3484.
	LI Z L， ZHOU Y， BAO R， et al. Deep distance metric learning method based on optimized triplet loss［J］. Journal of Computer Applications， 2021， 41（12）： 3480-3484.
119	KIM Y， PARK W. Multi-level distance regularization for deep metric learning［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 1827-1835.

数据集	详情
CUB-200-2011	共200个类，包含11 788张图片：前100个类用于训练（5 864张图片），其余的类用于测试（5 924张图片）
Cars196	共196个类，共16 185张图片：前98个类用于训练（8 054张图片），其他98个类用于测试（8 131张图片）
SOP	共22 634个产品，120 053张图片：前11 318个产品类别用于训练（59 551张图片），后11 316个产品类别用于测试（60 502张图片）
Market1501	共1 501个身份：750个身份用于培训，751个身份用于测试
CUHK03-NP	共1 360个标识：1 160个用于培训，100个用于测试，100个用于验证
MSMT17	共4 101个身份和126 441个边界框，由12个室外摄像头、3个室内摄像头在12个时间段内捕获的180 h的视频组成
VoxCeleb1	验证部分开发集包含1 211个说话人、21 819个视频和148 642个话语；测试集包含40个说话人、677个视频和4 874个话语
VoxCeleb1-E	VoxCeleb1的扩展版本，包含1 251个说话人共581 480组测试对

数据集	详情
CUB-200-2011	共200个类，包含11 788张图片：前100个类用于训练（5 864张图片），其余的类用于测试（5 924张图片）
Cars196	共196个类，共16 185张图片：前98个类用于训练（8 054张图片），其他98个类用于测试（8 131张图片）
SOP	共22 634个产品，120 053张图片：前11 318个产品类别用于训练（59 551张图片），后11 316个产品类别用于测试（60 502张图片）
Market1501	共1 501个身份：750个身份用于培训，751个身份用于测试
CUHK03-NP	共1 360个标识：1 160个用于培训，100个用于测试，100个用于验证
MSMT17	共4 101个身份和126 441个边界框，由12个室外摄像头、3个室内摄像头在12个时间段内捕获的180 h的视频组成
VoxCeleb1	验证部分开发集包含1 211个说话人、21 819个视频和148 642个话语；测试集包含40个说话人、677个视频和4 874个话语
VoxCeleb1-E	VoxCeleb1的扩展版本，包含1 251个说话人共581 480组测试对

方法	网络层数	CUB-200-2011					Cars196
方法	网络层数	NMI/%	R@1/%	R@2/%	R@4/%	R@8/%	NMI/%	R@1/%	R@2/%	R@4/%	R@8/%
Circleloss^［28］	512	—	66.7	77.4	86.2	91.2	—	83.4	89.8	94.1	96.5
ProxyAnchor^［61］	512	—	68.4	79.2	—	—	—	86.8	91.6	—	—
ProxyGML^［79］	512	69.8	66.6	77.6	—	—	72.4	85.5	91.8	—	—
DRML^［80］	512	69.3	68.7	78.6	—	—	72.1	86.9	92.1	—	—
PA+MemVir^［81］	512	—	69.0	79.2	—	—	—	86.7	92.0	—	—
HIST^［82］	512	70.8	69.7	80.0	87.3	—	73.0	87.4	92.5	95.4	—
PA+NIR^［59］	512	71.0	70.1	80.1	—	—	73.7	87.9	92.8	—	—
CCL^［83］		—	71.8	80.8	87.8	—	—	89.6	93.9	96.4	—

方法	网络层数	CUB-200-2011					Cars196
方法	网络层数	NMI/%	R@1/%	R@2/%	R@4/%	R@8/%	NMI/%	R@1/%	R@2/%	R@4/%	R@8/%
Circleloss^［28］	512	—	66.7	77.4	86.2	91.2	—	83.4	89.8	94.1	96.5
ProxyAnchor^［61］	512	—	68.4	79.2	—	—	—	86.8	91.6	—	—
ProxyGML^［79］	512	69.8	66.6	77.6	—	—	72.4	85.5	91.8	—	—
DRML^［80］	512	69.3	68.7	78.6	—	—	72.1	86.9	92.1	—	—
PA+MemVir^［81］	512	—	69.0	79.2	—	—	—	86.7	92.0	—	—
HIST^［82］	512	70.8	69.7	80.0	87.3	—	73.0	87.4	92.5	95.4	—
PA+NIR^［59］	512	71.0	70.1	80.1	—	—	73.7	87.9	92.8	—	—
CCL^［83］		—	71.8	80.8	87.8	—	—	89.6	93.9	96.4	—

方法	网络层数	指标值/%
方法	网络层数	NMI	R@1	R@10	R@100	R@1 000
Circleloss^［28］	512	—	83.4	89.8	94.1	96.5
ProxyAnchor^［61］	512	—	79.1	90.8	—	—
ProxyGML^［79］	512	90.2	78.0	90.6	—	—
DRML^［80］	512	88.1	71.5	85.2	—	—
PA+MemVir^［81］	512	—	79.7	91.0	—	—
HIST^［82］	512	92.2	79.6	91.0	96.2	—
PA+NIR^［59］	512	90.2	79.3	90.4	—	—
CCL^［83］		—	82.3	93.0	97.4	—

Overview of deep metric learning

深度度量学习综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 119

Related Articles 15

Recommended Articles

Metrics

方法	Market1501		CUHK03-NP		MSMT17
方法	R@1	mAP	R@1	mAP	R@1	mAP
M3L^［84］	75.9	50.2	33.1	32.1	36.9	14.7
OSNet-AIN^［85］	94.2	84.4	—	—	23.5	8.2
QAConv-GS^［86］	91.6	75.5	19.1	18.1	45.9	17.2
QAConv-MS^［87］	—	—	26.2	24.3	48.8	19.3

数据集	方法	EER/%	minDCF
VoxCeleb1	ECAPA-TDNN^［88］	0.87	0.107
	Bayesian attn-8+Channel attn^［89］	0.76	0.077
	Bayesian attn-32+Channel attn^［89］	0.74	0.076
VoxCeleb1-E	ECAPA-TDNN^［88］	1.12	0.132
	Bayesian attn-8+Channel attn^［89］	1.08	0.079
	Bayesian attn-32+Channel attn^［89］	1.04	0.075

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[3]	Rui SHI, Yong LI, Yanhan ZHU. Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient [J]. Journal of Computer Applications, 2024, 44(8): 2521-2527.
[4]	Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.
[5]	Zihao YAO, Yuanming LI, Ziqiang MA, Yang LI, Lianggen WEI. Multi-object cache side-channel attack detection model based on machine learning [J]. Journal of Computer Applications, 2024, 44(6): 1862-1871.
[6]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.
[7]	Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672.
[8]	Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN. Survey of visual object tracking methods based on Transformer [J]. Journal of Computer Applications, 2024, 44(5): 1644-1654.
[9]	Bin XIAO, Mo YANG, Min WANG, Guangyuan QIN, Huan LI. Domain generalization method of phase-frequency fusion from independent perspective [J]. Journal of Computer Applications, 2024, 44(4): 1002-1009.
[10]	Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN. Hyperparameter optimization for neural network based on improved real coding genetic algorithm [J]. Journal of Computer Applications, 2024, 44(3): 671-676.
[11]	Yi ZHENG, Cunyi LIAO, Tianqian ZHANG, Ji WANG, Shouyin LIU. Image denoising-based cell-level RSRP estimation method for urban areas [J]. Journal of Computer Applications, 2024, 44(3): 855-862.
[12]	Mengmei YAN, Dongping YANG. Review of mean field theory for deep neural network [J]. Journal of Computer Applications, 2024, 44(2): 331-343.
[13]	Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
[14]	Xuebin CHEN, Changsheng QU. Overview of backdoor attacks and defense in federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3459-3469.
[15]	Yongjiang LIU, Bin CHEN. Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank [J]. Journal of Computer Applications, 2024, 44(11): 3587-3594.