Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1479-1484.DOI: 10.11772/j.issn.1001-9081.2023050880

• The 19th China Conference on Machine Learning (CCML 2023) • Previous Articles    

Robust learning method by reweighting examples with negative learning

Boshi ZOU, Ming YANG, Chenchen ZONG, Mingkun XIE, Shengjun HUANG()   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing Jiangsu 211106,China
  • Received:2023-07-05 Revised:2023-07-21 Accepted:2023-07-24 Online:2023-08-07 Published:2024-05-10
  • Contact: Shengjun HUANG
  • About author:ZOU Boshi, born in 1999, M.S. candidate. His research interests include machine learning.
    YANG Ming, born in 2002. His research interests include machine learning.
    ZONG Chenchen, born in 2000, Ph. D. candidate. His research interests include active learning, noisy label learning, partial label learning.
    XIE Mingkun, born in 1995, Ph. D. candidate. His research interests include machine learning.

基于负学习的样本重加权鲁棒学习方法

邹博士, 杨铭, 宗辰辰, 谢明昆, 黄圣君()   

  1. 南京航空航天大学 计算机科学与技术学院,南京 211106
  • 通讯作者: 黄圣君
  • 作者简介:邹博士(1999—),男,河南商丘人,硕士研究生,主要研究方向:机器学习
    杨铭(2002—),男,安徽六安人,主要研究方向:机器学习
    宗辰辰(2000—),男,河南汝州人,博士研究生,CCF会员,主要研究方向:主动学习、噪声标记学习、偏标记学习
    谢明昆(1995—),男,福建厦门人,博士研究生,CCF会员,主要研究方向:机器学习
    第一联系人:黄圣君(1987—),男,湖南长沙人,教授,博士,CCF杰出会员,主要研究方向:机器学习、数据挖掘。

Abstract:

Noisy label learning methods can effectively use data containing noisy labels to train models and significantly reduce the labeling cost of large-scale datasets. Most existing noisy label learning methods usually assume that the number of each class in the dataset is balanced, but the data in many real-world scenarios tend to have noisy labels, and long-tailed distributions often present in the dataset simultaneously, making it difficult for existing methods to select clean examples from noisy examples in the tail class according to traning loss or confidence. To solve noisy long-tailed learning problem, a ReWeighting examples with Negative Learning (NLRW) method was proposed, by which examples were reweighted adaptively based on negative learning. Specifically, at each training epoch, the weights of examples were calculated according to the output distributions of the model to head classes and tail classes. The weights of clean examples were close to one while the weights of noisy examples were close to zero. To ensure accurate estimation of weights, negative learning and cross entropy loss were combined to train the model with a weighted loss function. Experimental results on CIFAR-10 and CIFAR-100 datasets with various imbalance rates and noise rates show that, compared with the optimal baseline model TBSS (Two stage Bi-dimensional Sample Selection) for noisy long-tail classification, NLRW method improves the average accuracy by 4.79% and 3.46%, respectively.

Key words: noisy label learning, long-tailed learning, noisy long-tailed learning, example reweighting, negative learning

摘要:

噪声标记学习方法能够有效利用含有噪声标记的数据训练模型,显著降低大规模数据集的标注成本。现有的噪声标记学习方法通常假设数据集中各个类别的样本数目是平衡的,但许多真实场景下的数据往往存在噪声标记,且数据的真实分布具有长尾现象,这导致现有方法难以设计有效的指标,如训练损失或置信度区分尾部类别中的干净样本和噪声样本。为了解决噪声长尾学习问题,提出一种基于负学习的样本重加权鲁棒学习(NLRW)方法。具体来说,根据模型对头部类别和尾部类别样本的输出分布,提出一种新的样本权重计算方法,能够使干净样本的权重接近1,噪声样本的权重接近0。为了保证模型对样本的输出准确,结合负学习和交叉熵损失使用样本加权的损失函数训练模型。实验结果表明,在多种不平衡率和噪声率的CIFAR-10以及CIFAR-100数据集上,NLRW方法相较于噪声长尾分类的最优基线模型TBSS(Two stage Bi-dimensional Sample Selection),平均准确率分别提升4.79%和3.46%。

关键词: 噪声标记学习, 长尾学习, 噪声长尾学习, 样本重加权, 负学习

CLC Number: