Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 352-357.DOI: 10.11772/j.issn.1001-9081.2019081403

• DPCS 2019 • Previous Articles     Next Articles

Hard-negative sample mining for metric learning based on linear assignment

Taiming FU, Yan CHEN(), Taoshen LI   

  1. College of Computer,Electronics and Information,Guangxi University,Nanning Guangxi 530004,China
  • Received:2019-07-31 Revised:2019-09-25 Accepted:2019-09-25 Online:2019-11-04 Published:2020-02-10
  • Contact: Yan CHEN
  • About author:FU Taiming, born in 1995, M. S. candidate. His research interests include optimization of intelligent algorithm, computer vision.
    LI Taoshen, born in 1957, Ph. D., professor. His research interests include intelligent system, optimization of intelligent algorithm.
  • Supported by:
    the National Natural Science Foundation of China(61762008);the Key Research and Development Program of Guangxi(AB17195014)

基于线性分配的难负样本挖掘度量学习

傅泰铭, 陈燕(), 李陶深   

  1. 广西大学 计算机与电子信息学院,南宁 530004
  • 通讯作者: 陈燕
  • 作者简介:傅泰铭(1995—),男,广西南宁人,硕士研究生,CCF会员,主要研究方向:智能算法优化、计算机视觉
    李陶深(1957—),男,广西邕宁人,CCF会员,教授,博士,主要研究方向:智能系统、智能算法优化。
  • 基金资助:
    国家自然科学基金资助项目(61762008);广西重点研发计划项目(AB17195014)

Abstract:

Scientists identify the species of whales based on the shape and the distinctive marks of the whale tails, but the process of recognition by human eyes and manual labeling is very cumbersome. The dataset of whale tail photo has the unbalanced data distribution, and some specific categories in the dataset have very few samples or even one sample. Besides, the samples have small individual differences and contain unknown categories, which leads to the difficulty in automatic labeling of whale identification by image classification. To solve the problem that metric learning is difficult to realize classification under this task, on the basis of Siamese Neural Network (SNN), the training batches were constructed dynamically by using Linear Assignment Problem (LAP) algorithm in the training process of hard-negative sample mining. Firstly, image feature vectors were extracted from the training samples, and the similarity metric of feature vector was calculated. Then, LAP was used to assign sample pairs to the model, training sample batches were constructed dynamically according to the metric score matrix, and the difficult sample pairs were targeted by trained. Experimental results on a whale tail image dataset with unbalanced data distribution and CUB 200-2001 dataset show that, the proposed algorithm can achieve good results in learning minority classes and classifying fine-grained images.

Key words: linear assignment, hard-negative sample mining, metric learning, fine-grained image recognition, Siamese Neural Network (SNN)

摘要:

科学家依靠鲸鱼尾巴的形状及其独特的标记来识别鲸鱼的种类,但靠人眼识别和手工标注的过程非常繁琐。而且鲸鱼尾巴照片数据集存在数据分布不均衡的特点,其中个别种类样本数量极少,甚至仅有一份;同时样本个体差异较小,并且包含未知类别,导致以图像分类的方式完成鲸鱼身份的自动标注存在困难。为解决度量学习在该任务下难以分类的问题,在孪生神经网络(SNN)的基础上,利用线性分配问题(LAP)算法进行难负样本挖掘训练过程从而动态地构筑训练批次。首先对训练样本提取图像特征向量,并计算特征向量的相似性度量;然后通过LAP为模型分配样本对,根据度量分数矩阵动态地构筑训练样本批次,针对性地训练困难样本对。在一个数据分布不平衡的鲸鱼尾巴图像数据集和CUB-200-2001数据集上得到的实验结果表明,所提算法在少数类学习和细粒度图像分类上能取得良好的效果。

关键词: 线性分配, 难负样本挖掘, 度量学习, 细粒度图像识别, 孪生神经网络

CLC Number: