《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 697-708.DOI: 10.11772/j.issn.1001-9081.2024091350

• 大模型前沿研究与典型应用 • 上一篇    下一篇

大语言模型的偏见挑战:识别、评估与去除

徐月梅1(), 叶宇齐2, 何雪怡1   

  1. 1.北京外国语大学 信息科学技术学院,北京 100089
    2.北京外国语大学 国际商学院,北京 100089
  • 收稿日期:2024-09-24 修回日期:2024-12-09 接受日期:2024-12-13 发布日期:2025-03-17 出版日期:2025-03-10
  • 通讯作者: 徐月梅
  • 作者简介:叶宇齐(2002—),女,四川眉山人,硕士研究生,主要研究方向:自然语言处理
    何雪怡(2003—),女,四川绵阳人,主要研究方向:自然语言处理。
  • 基金资助:
    国家社会科学基金资助项目(24CYY107);教育部人文社会科学项目(22YJA630018);中文信息学会SMP-智谱AI大模型交叉学科基金资助项目;中央高校基本科研业务费专项(2024TD001)

Bias challenges of large language models: identification, evaluation, and mitigation

Yuemei XU1(), Yuqi YE2, Xueyi HE1   

  1. 1.School of Information Science and Technology,Beijing Foreign Studies University,Beijing 100089,China
    2.International Business School,Beijing Foreign Studies University,Beijing 100089,China
  • Received:2024-09-24 Revised:2024-12-09 Accepted:2024-12-13 Online:2025-03-17 Published:2025-03-10
  • Contact: Yuemei XU
  • About author:YE Yuqi, born in 2002, M. S. candidate. Her research interests include natural language processing.
    HE Xueyi, born in 2003. Her research interests include natural language processing.
  • Supported by:
    National Social Science Foundation of China(24CYY107);Humanities and Social Sciences Research Project of Ministry of Education(22YJA630018);CIPSC-SMP-Zhipu.AI Large Model Cross-Disciplinary Foundation;Fundamental Research Funds for Central Universities(2024TD001)

摘要:

针对大语言模型(LLM)输出内容存在偏见而导致LLM不安全和不可控的问题,从偏见识别、偏见评估和偏见去除3个角度出发深入梳理和分析现有LLM偏见的研究现状、技术与局限。首先,概述LLM的三大关键技术,从中分析LLM不可避免存在内隐偏见(Intrinsic Bias)的根本原因;其次,总结现有LLM存在的语言偏见、人口偏见和评估偏见三类偏见类型,并分析这些偏见的特点和原因;再次,系统性回顾现有LLM偏见的评估基准,并探讨这些通用型评估基准、特定语言评估基准以及特定任务评估基准的优点及局限;最后,从模型去偏和数据去偏2个角度出发深入分析现有LLM去偏技术,并指出它们的改进方向,同时,分析指出LLM偏见研究的3个方向:偏见的多文化属性评估、轻量级的偏见去除技术以及偏见可解释性的增强。

关键词: 大语言模型, 偏见溯源, 偏见识别, 偏见评估, 偏见去除

Abstract:

Aiming at the unsafety and being out of control problems caused by biases in the output of Large Language Model (LLM), research status, techniques, and limitations related to biases in the existing LLMs were sorted deeply and analyzed from three aspects: bias identification, evaluation, and mitigation. Firstly, three key techniques of LLM were summed up to study the basic reasons of LLMs’ inevitable intrinsic biases. Secondly, three types of biases in LLMs were categorized into linguistic bias, demographic bias, and evaluation bias, and characteristics and causes of the biases were explored. Thirdly, a systematic review of the existing LLM bias evaluation benchmarks was carried out, and the strengths and weaknesses of these general-purpose, language-specific, and task-specific benchmarks were discussed. Finally, current LLM bias mitigation techniques were analyzed in depth from both model bias mitigation and data bias mitigation perspectives, and directions for their future refinement were pointed out. At the same time, the research directions for biases in LLMs were indicated by analysis: multi-cultural attribute evaluation of bias, lightweight bias mitigation techniques, and enhancement of the interpretability of biases.

Key words: Large Language Model (LLM), bias tracing, bias identification, bias evaluation, bias mitigation

中图分类号: