Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3588-3602.DOI: 10.11772/j.issn.1001-9081.2021122118

• Artificial intelligence • Previous Articles     Next Articles

Review on interpretability of deep learning

Xia LEI, Xionglin LUO()   

  1. College of Information Science and Engineering,China University of Petroleum,Beijing 102249,China
  • Received:2021-12-18 Revised:2022-02-12 Accepted:2022-02-23 Online:2022-03-02 Published:2022-11-10
  • Contact: Xionglin LUO
  • About author:LEI Xia, born in 1989, Ph. D. candidate. Her research interests include machine learning, optimal control.
    LUO Xionglin, born in 1963, Ph. D., professor. His research interests include control theory, process control, chemical system engineering, machine learning.
  • Supported by:
    National Natural Science Foundation of China(61703434)

深度学习可解释性研究综述

雷霞, 罗雄麟()   

  1. 中国石油大学(北京) 信息科学与工程学院,北京 102249
  • 通讯作者: 罗雄麟
  • 作者简介:雷霞(1989—),女,福建建瓯人,博士研究生,主要研究方向:机器学习、最优控制
    罗雄麟(1963—),男,湖南汨罗人,教授,博士,主要研究方向:控制理论、过程控制、化工系统工程、机器学习。luoxl@cup.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61703434)

Abstract:

With the widespread application of deep learning, human beings are increasingly relying on a large number of complex systems that adopt deep learning techniques. However, the black?box property of deep learning models offers challenges to the use of these models in mission?critical applications and raises ethical and legal concerns. Therefore, making deep learning models interpretable is the first problem to be solved to make them trustworthy. As a result, researches in the field of interpretable artificial intelligence have emerged. These researches mainly focus on explaining model decisions or behaviors explicitly to human observers. A review of interpretability for deep learning was performed to build a good foundation for further in?depth research and establishment of more efficient and interpretable deep learning models. Firstly, the interpretability of deep learning was outlined, the requirements and definitions of interpretability research were clarified. Then, several typical models and algorithms of interpretability research were introduced from the three aspects of explaining the logic rules, decision attribution and internal structure representation of deep learning models. In addition, three common methods for constructing intrinsically interpretable models were pointed out. Finally, the four evaluation indicators of fidelity, accuracy, robustness and comprehensibility were introduced briefly, and the possible future development directions of deep learning interpretability were discussed.

Key words: deep learning, interpretability, decision attribution, latent representation, evaluation indicator

摘要:

随着深度学习的广泛应用,人类越来越依赖于大量采用深度学习技术的复杂系统,然而,深度学习模型的黑盒特性对其在关键任务应用中的使用提出了挑战,引发了道德和法律方面的担忧,因此,使深度学习模型具有可解释性是使它们令人信服首先要解决的问题。于是,关于可解释的人工智能领域的研究应运而生,主要集中于向人类观察者明确解释模型的决策或行为。对深度学习可解释性的研究现状进行综述,为进一步深入研究建立更高效且具有可解释性的深度学习模型确立良好的基础。首先,对深度学习可解释性进行了概述,阐明可解释性研究的需求和定义;然后,从解释深度学习模型的逻辑规则、决策归因和内部结构表示这三个方面出发介绍了几种可解释性研究的典型模型和算法,另外还指出了三种常见的内置可解释模型的构建方法;最后,简单介绍了忠实度、准确性、鲁棒性和可理解性这四种评价指标,并讨论了深度学习可解释性未来可能的发展方向。

关键词: 深度学习, 可解释性, 决策归因, 隐层表示, 评价指标

CLC Number: