Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 674-684.DOI: 10.11772/j.issn.1001-9081.2022020198

• Artificial intelligence • Previous Articles    

Survey of label noise learning algorithms based on deep learning

Boyi FU1,2, Yuncong PENG1,2, Xin LAN1,2, Xiaolin QIN1,2()   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-02-22 Revised:2022-05-18 Accepted:2022-05-26 Online:2022-08-16 Published:2023-03-10
  • Contact: Xiaolin QIN
  • About author:FU Boyi, born in 1998, M. S. candidate. Her research interests include label noise, image semantic understanding, object detection.
    PENG Yuncong, born in 1998, M. S. candidate. His research interests include statistical machine learning, image semantic understanding, few-shot learning.
    LAN Xin, born in 1998, M. S. candidate. Her research interests include deep learning, image semantic understanding, object detection.
    QIN Xiaolin, born in 1980, Ph. D., research fellow. His research interests include automated reasoning, artificial intelligence.
  • Supported by:
    National Academy of Science Alliance Collaborative Program (Chengdu Branch of Chinese Academy of Sciences — Chongqing Academy of Science and Technology), Key Reginal Program of Science and Technology Service Network Initiative (Type A)(KFJ-STS-QYZD-2021-21-001);Sichuan Science and Technology Program(2019ZDZX0006)

基于深度学习的标签噪声学习算法综述

伏博毅1,2, 彭云聪1,2, 蓝鑫1,2, 秦小林1,2()   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院大学 计算机科学与技术学院,北京 100049
  • 通讯作者: 秦小林
  • 作者简介:伏博毅(1998—),女,湖南岳阳人,硕士研究生,CCF会员,主要研究方向:标签噪声、图像语义理解、目标检测
    彭云聪(1998—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:统计机器学习、图像语义理解、小样本学习
    蓝鑫(1998—),女,福建龙岩人,硕士研究生,CCF会员,主要研究方向:深度学习、图像语义理解、目标检测
    秦小林(1980—),男,重庆人,研究员,博士,CCF会员,主要研究方向:自动推理、人工智能。
  • 基金资助:
    全国科学院联盟合作项目(中国科学院成都分院-重庆市科学技术研究院);中科院STS区域重点项目(A类)(KFJ-STS-QYZD-2021-21-001);四川省科技计划资助项目(2019ZDZX0006)

Abstract:

In the field of deep learning, a large number of correctly labeled samples are essential for model training. However, in practical applications, labeling data requires high labeling cost. At the same time, the quality of labeled samples is affected by subjective factors or tool and technology of manual labeling, which inevitably introduces label noise in the annotation process. Therefore, existing training data available for practical applications is subject to a certain amount of label noise. How to effectively train training data with label noise has become a research hotspot. Aiming at label noise learning algorithms based on deep learning, firstly, the source, classification and impact of label noise learning strategies were elaborated; secondly, four label noise learning strategies based on data, loss function, model and training method were analyzed according to different elements of machine learning; then, a basic framework for learning label noise in various application scenarios was provided; finally, some optimization ideas were given, and challenges and future development directions of label noise learning algorithms were proposed.

Key words: label noise, semi-supervised learning, supervised learning, deep learning, loss function

摘要:

在深度学习领域中,大量正确标注的样本对于模型的训练和学习至关重要。然而,在实际的应用场景中,标注数据的成本很高,同时标注的样本质量会受人工标注的主观因素或工具技术的影响,在标注过程中无法避免标签噪声的产生。因此,现有的训练数据都存在一定的标签噪声,如何有效地训练带标签噪声的训练数据成为了研究的热点。围绕基于深度学习的标签噪声学习算法,首先详细阐述了标签噪声学习问题的来源、分类和影响;然后依照机器学习的不同要素分析了基于数据、损失函数、模型、训练方式的四种标签噪声学习策略;随后提供了各种应用场景下学习标签噪声问题的基础框架;最后,给出一些优化思路,并展望了标签噪声学习算法面临的挑战与未来的发展方向。

关键词: 标签噪声, 半监督学习, 监督学习, 深度学习, 损失函数

CLC Number: