Review on interpretability of deep learning

doi:10.11772/j.issn.1001-9081.2021122118

Abstract

Abstract:

With the widespread application of deep learning， human beings are increasingly relying on a large number of complex systems that adopt deep learning techniques. However， the black?box property of deep learning models offers challenges to the use of these models in mission?critical applications and raises ethical and legal concerns. Therefore， making deep learning models interpretable is the first problem to be solved to make them trustworthy. As a result， researches in the field of interpretable artificial intelligence have emerged. These researches mainly focus on explaining model decisions or behaviors explicitly to human observers. A review of interpretability for deep learning was performed to build a good foundation for further in?depth research and establishment of more efficient and interpretable deep learning models. Firstly， the interpretability of deep learning was outlined， the requirements and definitions of interpretability research were clarified. Then， several typical models and algorithms of interpretability research were introduced from the three aspects of explaining the logic rules， decision attribution and internal structure representation of deep learning models. In addition， three common methods for constructing intrinsically interpretable models were pointed out. Finally， the four evaluation indicators of fidelity， accuracy， robustness and comprehensibility were introduced briefly， and the possible future development directions of deep learning interpretability were discussed.

Key words: deep learning, interpretability, decision attribution, latent representation, evaluation indicator

摘要：

随着深度学习的广泛应用，人类越来越依赖于大量采用深度学习技术的复杂系统，然而，深度学习模型的黑盒特性对其在关键任务应用中的使用提出了挑战，引发了道德和法律方面的担忧，因此，使深度学习模型具有可解释性是使它们令人信服首先要解决的问题。于是，关于可解释的人工智能领域的研究应运而生，主要集中于向人类观察者明确解释模型的决策或行为。对深度学习可解释性的研究现状进行综述，为进一步深入研究建立更高效且具有可解释性的深度学习模型确立良好的基础。首先，对深度学习可解释性进行了概述，阐明可解释性研究的需求和定义；然后，从解释深度学习模型的逻辑规则、决策归因和内部结构表示这三个方面出发介绍了几种可解释性研究的典型模型和算法，另外还指出了三种常见的内置可解释模型的构建方法；最后，简单介绍了忠实度、准确性、鲁棒性和可理解性这四种评价指标，并讨论了深度学习可解释性未来可能的发展方向。

关键词: 深度学习, 可解释性, 决策归因, 隐层表示, 评价指标

CLC Number:

TP18

Xia LEI, Xionglin LUO. Review on interpretability of deep learning[J]. Journal of Computer Applications, 2022, 42(11): 3588-3602.

雷霞, 罗雄麟. 深度学习可解释性研究综述[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3588-3602.

Figures/Tables 7

References 111

1	LEE S M， SEO J B， YUN J， et al. Deep learning applications in chest radiography and computed tomography［J］. Journal of Thoracic Imaging， 2019， 34（2）： 75-85. 10.1097/rti.0000000000000387
2	CHEN R P， YANG L， GOODISON S， et al. Deep‑learning approach identifying cancer subtypes using high‑dimensional genomic data［J］. Bioinformiatics， 2020， 36（5）： 1476-1483. 10.1093/bioinformatics/btz769
3	GRIGORESCU S， TRASNEA B， COCIAS T， et al. A survey of deep learning techniques for autonomous driving［J］. Journal of Field Robotics， 2020， 37（3）： 362-386. 10.1002/rob.21918
4	FENG D， HAASE‑SCHÜTZ C， ROSENBAUM L， et al. Deep multi‑modal object detection and semantic segmentation for autonomous driving： Datasets， methods， and challenges［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（3）： 1341-1360. 10.1109/TITS.2020.2972974
5	SAHBA A， DAS A， RAD P， et al. Image graph production by dense captioning［C］// Proceedings of the 2018 World Automation Congress. Piscataway： IEEE， 2018： 1-5. 10.23919/wac.2018.8430485
6	BENDRE N， EBADI N， PREVOST J J， et al. Human action performance using deep neuro‑fuzzy recurrent attention model［J］. IEEE Access， 2020， 8： 57749-57761. 10.1109/access.2020.2982364
7	BOLES A， RAD P. Voice biometrics： deep learning‑based voiceprint authentication system［C］// Proceedings of the 12th System of Systems Engineering Conference. Piscataway： IEEE， 2017： 1-6. 10.1109/sysose.2017.7994971
8	PANWAR S， DAS A， ROOPAEI M， et al. A deep learning approach for mapping music genres［C］// Proceedings of the 12th System of Systems Engineering Conference. Piscataway： IEEE， 2017： 1-5. 10.1109/sysose.2017.7994970
9	DE LA TORRE PARRA G， RAD P， CHOO K K R， et al. Detecting Internet of Things attacks using distributed deep learning［J］. Journal of Network and Computer Applications， 2020， 163： No.102662. 10.1016/j.jnca.2020.102662
10	CHACON H， SILVA S， RAD P. Deep learning poison data attack detection［C］// Proceedings of the IEEE 31st International Conference on Tools with Artificial Intelligence. Piscataway： IEEE， 2019： 971-978. 10.1109/ictai.2019.00137
11	MHASKAR H N， POGGIO T. Deep vs. shallow networks： an approximation theory perspective［J］. Analysis and Applications， 2016， 14（6）： 829-848. 10.1142/s0219530516400042
12	LIAO Q L， POGGIO T. Theory of deep learning Ⅱ： landscape of the empirical risk in deep learning： CBMM Memo No.066［EB/OL］. （2017-06-23）［2021-09-23］..
13	ZHANG C Y， LIAO Q L， RAKHLIN A， et al. Musings on deep learning： properties of SGD， CBMM Memo Series 067［EB/OL］. （2017-12-26）［2021-09-23］..
14	CINÀ A E， TORCINOVICH A， PELILLO M. A black‑box adversarial attack for poisoning clustering［J］. Pattern Recognition， 2022， 122： No.108306. 10.1016/j.patcog.2021.108306
15	SEMWAL P， HANDA A. Cyber‑attack detection in cyber‑physical systems using supervised machine learning［M］// CHOO K K R， DEHGHANTANHA A. Handbook of Big Data Analytics and Forensics. Cham： Springer， 2022： 131-140. 10.1007/978-3-030-74753-4_9
16	ENGSTROM L， TRAN B， TSIPRAS D， et al. Exploring the landscape of spatial robustness［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 1802-1811. 10.23915/distill.00019.7
17	SZEGEDY C， ZAREMBA W， SUTSKEVER I， et al. Intriguing properties of neural networks［EB/OL］. （2014-02-19）［2021-05-16］..
18	NGUYEN A， YOSINSKI J， CLUNE J. Deep neural networks are easily fooled： high confidence predictions for unrecognizable images［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 427-436. 10.1109/cvpr.2015.7298640
19	HENGSTLER M， ENKEL E， DUELLI S. Applied artificial intelligence and trust — the case of autonomous vehicles and medical assistance devices［J］. Technological Forecasting and Social Change， 2016， 105： 105-120. 10.1016/j.techfore.2015.12.014
20	LUI A， LAMB G W. Artificial intelligence and augmented intelligence collaboration： regaining trust and confidence in the financial sector［J］. Information and Communications Technology Law， 2018， 27（3）： 267-283. 10.1080/13600834.2018.1488659
21	WELD D S， BANSAL G. The challenge of crafting intelligible intelligence［J］. Communications of the ACM， 2019， 62（6）： 70-79. 10.1145/3282486
22	BOSTROM N， YUDKOWSKY E. The ethics of artificial intelligence［M］// FRANKISH K， RAMSEY W M. The Cambridge Handbook of Artificial Intelligence Cambridge： Cambridge University Press， 2014： 316-334. 10.1017/cbo9781139046855.020
23	ETZIONI A， ETZIONI O. Incorporating ethics into artificial intelligence［J］. The Journal of Ethics， 2017， 21（4）： 403-418. 10.1007/s10892-017-9252-2
24	STAHL B C， WRIGHT D. Ethics and privacy in ai and big data： implementing responsible research and innovation［J］. IEEE Security and Privacy， 2018， 16（3）： 26-33. 10.1109/msp.2018.2701164
25	KESKINBORA K H. Medical ethics considerations on artificial intelligence［J］. Journal of Clinical Neuroscience， 2019， 64： 277-282. 10.1016/j.jocn.2019.03.001
26	CHEN L Y， CRUZ A， RAMSEY S， et al. Hidden bias in the DUD‑E dataset leads to misleading performance of deep learning in structure‑based virtual screening［J］. PLoS ONE， 2019， 14（8）： No.e0220113. 10.1371/journal.pone.0220113
27	CHALLEN R， DENNY J， PITT M， et al. Artificial intelligence， bias and clinical safety［J］. BMJ Quality and Safety， 2019， 28（3）：231-237. 10.1136/bmjqs-2018-008370
28	SINZ F H， PITKOW X， REIMER J， et al. Engineering a less artificial intelligence［J］. Neuron， 2019， 103（6）： 967-979. 10.1016/j.neuron.2019.08.034
29	KURAKIN A， GOODFELLOW I J， BENGIO S. Adversarial machine learning at scale［EB/OL］. （2017-02-11）［2021-07-09］.. 10.1201/9781351251389-8
30	GOODFELLOW I J， SHLENS J， SZEGEDY C. Explaining and harnessing adversarial examples［EB/OL］. （2015-03-20）［2021-05-16］..
31	SU J W， VARGAS D V， SAKURAI K. One pixel attack for fooling deep neural networks［J］. IEEE Transactions on Evolutionary Computation， 2019， 23（5）： 828-841. 10.1109/tevc.2019.2890858
32	HUANG S， PAPERNOT N， GOODFELLOW I， et al. Adversarial attacks on neural network policies［EB/OL］. （2017-02-08）［2020-05-16］..
33	GOODMAN B， FLAXMAN S. European Union regulations on algorithmic decision‑making and a “right to explanation”［J］. AI Magazine， 2017， 38（3）： 50-57. 10.1609/aimag.v38i3.2741
34	CHOULDECHOVA A. Fair prediction with disparate impact： a study of bias in recidivism prediction instruments［J］. Big Data， 2017， 5（2）： 153-163. 10.1089/big.2016.0047
35	VOIGT P， VON DEM BUSSCHE A. The EU General Data Protection Regulation （GDPR）： A Practical Guide［M］. Cham： Springer， 2017： 141-187. 10.1007/978-3-319-57959-7
36	ALVAREZ‑MELIS D， JAAKKOLA T. Towards robust interpretability with self‑explaining neural networks［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 7786-7795. 10.18653/v1/d17-1042
37	GUIDOTTI R， MONREALE A， RUGGIERI S， et al. A survey of methods for explaining black box models［J］. ACM Computing Surveys， 2018， 51（5）： No.93. 10.1145/3236009
38	SATO M， TSUKIMOTO H. Rule extraction from neural networks via decision tree induction［C］// Proceedings of the 2001 International Joint Conference on Neural Networks. Piscataway： IEEE， 2001： 1870-1875.
39	ZILKE J R， LOZA MENCÍA E， JANSSEN F. DeepRED‑rule extraction from deep neural networks［C］// Proceedings of the 2016 International Conference on Discovery Science， LNCS 9956. Cham： Springer， 2016： 457-473.
40	AUGASTA M G， KATHIRVALAVAKUMAR T. Reverse engineering the neural networks for rule extraction in classification problems［J］. Neural Processing Letters， 2012， 35（2）： 131-150. 10.1007/s11063-011-9207-8
41	SALZBERG S L. C4.5： Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers， Inc.， 1993［J］. Machine Learning， 1994， 16（3）： 235-240. 10.1007/bf00993309
42	BOZO O. Extracting decision trees from trained neural networks［C］// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2002：456-461. 10.1145/775047.775113
43	WU M， PARBHOO S， HUGHES M C， et al. Regional tree regularization for interpretability in deep neural networks［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 6413-6421. 10.1609/aaai.v34i04.6112
44	WU M， HUGHES M C， PARBHOO S， et al. Beyond sparsity： tree regularization of deep models for interpretability［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 1670-1678. 10.1609/aaai.v32i1.11501
45	WANG X， WANG D X， XU C R， et al. Explainable reasoning over knowledge graphs for recommendation［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 5329-5336. 10.1609/aaai.v33i01.33015329
46	YU X， REN X， GU Q Q， et al. Collaborative filtering with entity similarity regularization in heterogeneous information networks［C/OL］// Proceedings of the 2nd IJCAI Workshop on Heterogeneous Information Network Analysis. ［2021-09-22］.. 10.1145/2556195.2556259
47	GAO L， YANG H， WU J， et al. Recommendation with multi‑ source heterogeneous information［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2018： 3378-3384. 10.24963/ijcai.2018/469
48	XIAN Y K， FU Z H， MUTHUKRISHNAN S， et al. Reinforcement knowledge graph reasoning for explainable recommendation［C］// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2019： 285-294. 10.1145/3331184.3331203
49	HE R N， McAULEY J. Ups and downs： modeling the visual evolution of fashion trends with one‑class collaborative filtering［C］// Proceedings of the 25th International Conference on World Wide Web. Republic and Canton of Geneva： International World Wide Web Conferences Steering Committee， 2016： 507-517. 10.1145/2872427.2883037
50	BORDES A， USUNIER N， GARCIAD‑DURÁN A， et al. Translating embeddings for modeling multi‑relational data［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2013： 2787-2795.
51	LIN Y K， LIU Z Y， SUN M S， et al. Learning entity and relation embeddings for knowledge graph completion［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 2181-2187. 10.1609/aaai.v29i1.9491
52	AI Q Y， AZIZI V， CHEN X， et al. Learning heterogeneous knowledge base embeddings for explainable recommendation［J］. Algorithms， 2018， 11（9）： No.137. 10.3390/a11090137
53	REN H Y， HU W H， LESKOVEC J. Query2box： reasoning over knowledge graphs in vector space using box embeddings［EB/OL］. （2020-02-29）［2021-05-16］..
54	ZEILER M D， FERGUS R. Visualizing and understanding convolutional networks［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8689. Cham： Springer， 2014： 818-833.
55	KOH P W， LIANG P. Understanding black‑box predictions via influence functions［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1885-1894.
56	ZINTGRAF L M， COHEN T S， ADEL T， et al. Visualizing deep neural network decisions： prediction difference analysis［EB/OL］. （2017-02-15）［2021-05-16］..
57	PETSIUK V， DAS A， SAENKO K. RISE： randomized input sampling for explanation of black‑box models［C］// Proceedings of the 2018 British Machine Vision Conference. Durham： BMVA Press， 2018： No.1064.
58	SIMONYAN K， VEDALDI A， ZISSERMAN A. Deep inside convolutional networks： visualising image classification models and saliency maps［EB/OL］. （2014-04-19）［2021-05-06］..
59	SPRINGENBERG J T， DOSOVITSKIY A， BROX T， et al. Striving for simplicity： the all convolutional net［EB/OL］. （2015-04-13）［2021-06-07］..
60	SUNDARARAJAN M， TALY A， YAN Q Q. Gradients of counterfactuals［EB/OL］. （2016-11-15）［2021-06-11］..
61	SMILKOV D， THORAT N， KIM B， et al. SmoothGrad： removing noise by adding noise［EB/OL］. （2017-06-12）［2021-06-23］..
62	ZHOU B L， KHOSLA A， LAPEDRIZA A， et al. Learning deep features for discriminative localization［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2921-2929. 10.1109/cvpr.2016.319
63	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad‑CAM： visual explanations from deep networks via gradient‑based localization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74
64	CHATTOPADHAY A， SARKAR A， HOWLADER P， et al. Grad‑CAM++： generalized gradient‑based visual explanations for deep convolutional networks［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 839-847. 10.1109/wacv.2018.00097
65	SHRIKUMAR A， GREENSIDE P， KUNDAJE A. Learning important features through propagating activation differences［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 3145-3153.
66	BACH S， BINDER A， MONTAVON G， et al. On pixel‑wise explanations for non‑linear classifier decisions by layer‑wise relevance propagation［J］. PLoS ONE， 2015， 10（7）： No.e0130140. 10.1371/journal.pone.0130140
67	KINDERMANS P J， HOOKER S， ADEBAYO J， et al. The （un）reliability of saliency methods［M］// SAMEK W， MONTAVON G， VEDALDI A， et al. Explainable AI： Interpreting， Explaining and Visualizing Deep Learning， LNCS 11700. Cham： Springer， 2019： 267-280.
68	RIBEIRO M T， SINGH S， GUESTRIN C. "Why should I trust you？" explaining the predictions of any classifier［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 1135-1144. 10.1145/2939672.2939778
69	GUO W B， MU D L， XU J， et al. LEMNA： explaining deep learning based security applications［C］// Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. New York： ACM， 2018： 364-379. 10.1145/3243734.3243792
70	ZAFAR M R， KHAN N M. DLIME： a deterministic local interpretable model‑agnostic explanations approach for computer‑ aided diagnosis systems［EB/OL］. （2019-06-24）［2021-07-03］.. 10.3390/make3030027
71	SHI S， ZHANG X F， FAN W. A modified perturbed sampling method for local interpretable model‑agnostic explanation［EB/OL］. （2020-02-18）［2021-08-16］.. 10.1201/9780429027192-11
72	BRAMHALL S， HORN H， TIEU M， et al. QLIME — a quadratic local interpretable model‑agnostic explanation approach［J］. SMU Data Science Review， 2020， 3（1）： No.4.
73	KIM B， WATTENBERG M， GILMER J， et al. Interpretability beyond feature attribution： quantitative Testing with Concept Activation Vectors （TCAV）［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 2668-2677.
74	GHORBANI A， WEXLER J， ZOU J， et al. Towards automatic concept‑based explanations［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-09-21］..
75	GOYAL Y， FEDER A， SHALIT U， et al. Explaining classifiers with Causal Concept Effect （CaCE）［EB/OL］. （2020-02-28）［2021-08-19］..
76	PEARL J. Causality［M］. 2nd ed. Cambridge： Cambridge University Press， 2009.
77	YEH C‑K， KIM B， ARIK S Ö， et al. On completeness-aware concept-based explanations in deep neural networks［C］// NeurIPS 2020： Proceedings of the 2020 Advances in Neural Information Processing Systems 33. Berlin： Springer， 2020： 20554-20565. 10.1016/j.neucom.2020.01.001
78	BIEN J， TIBSHIRANI R. Prototype selection for interpretable classification［J］. The Annals of Applied Statistics， 2011， 5（4）： 2403-2424. 10.1214/11-aoas495
79	KIM B， KHANNA R， KOYEJO O. Examples are not enough， learn to criticize！ criticism for interpretability［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 2288-2296.
80	LI O， LIU H， CHEN C F， et al. Deep learning for case‑based reasoning through prototypes： a neural network that explains its predictions［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 3530-3537. 10.1609/aaai.v32i1.11771
81	CHEN C F， LI O， TAO C F， et al. This looks like that： deep learning for interpretable image recognition［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-09-21］..
82	RAZAVIAN A S， AZIZPOUR H， SULLIVAN J， et al. CNN features off‑the‑shelf： an astounding baseline for recognition［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2014： 512-519. 10.1109/cvprw.2014.131
83	YOSINSKI J， CLUNE J， BENGIO Y， et al. How transferable are features in deep neural networks？［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3320-3328.
84	ZHOU B L， KHOSLA A， LAPEDRIZA A， et al. Object detectors emerge in deep scene CNNs［EB/OL］. （2015-04-15）［2021-06-16］..
85	MORCOS A S， BARRETT D G T， RABINOWITZ N C， et al. On the importance of single directions for generalization［EB/OL］. （2018-05-22）［2021-05-16］..
86	ZHOU B L， SUN Y Y， BAU D， et al. Revisiting the importance of individual units in CNNs via ablation［EB/OL］. （2018-06-07）［2021-05-16］..
87	BAU D， ZHU J Y， STROBELT H， et al. Understanding the role of individual units in a deep neural network［J］. Proceedings of the National Academy of Sciences of the United States of America， 2020， 117（48）： 30071-30078. 10.1073/pnas.1907375117
88	LETHAM B， RUDIN C， McCORMICK T H， et al. Interpretable classifiers using rules and Bayesian analysis： building a better stroke prediction model［J］. The Annals of Applied Statistics， 2015， 9（3）： 1350-1371. 10.1214/15-aoas848
89	YANG H Y， RUDIN C， SELTZER M. Scalable Bayesian rule lists［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 3921-3930.
90	ZHOU Z H， FENG J. Deep forest： towards an alternative to deep neural networks［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 3553-3559. 10.24963/ijcai.2017/497
91	PEDAPATI T， BALAKRISHNAN A， SHANMUGAN K， et al. Learning global transparent models consistent with local contrastive explanations［C/OL］// Proceedings of the 34th Conference on Neural Information Processing Systems. ［2021-09-21］. .
92	HASTIE T， TIBSHIRANI R J. Generalized additive models［J］. Statistical Science， 1986， 1（3）：297-310. 10.1214/ss/1177013604
93	CARUANA R， LOU Y， GEHRKE J， et al. Intelligible models for healthcare： predicting pneumonia risk and hospital 30‑day readmission［C］// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2015： 1721-1730. 10.1145/2783258.2788613
94	AGARWAL R， MELNICK L， FROSST N， et al. Neural additive models： interpretable machine learning with neural nets［C/OL］// Proceedings of the 35th Conference on Neural Information Processing Systems. ［2022-01-21］.. 10.1002/9781119791416.ch2
95	ANTOL S， AGRAWAL A， LU J S， et al. VQA： visual question answering［C］// Proceedings of the IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 2425-2433. 10.1109/iccv.2015.279
96	FUKUI A， PARK D H， YANG D， et al. Multimodal compact bilinear pooling for visual question answering and visual grounding［C］// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2016： 457-468. 10.18653/v1/d16-1044
97	HENDRICKS L A， AKATA Z， ROHRBACH M， et al. Generating visual explanations［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9908. Cham： Springer， 2016： 3-19.
98	DONAHUE J， HENDRICKS L A， GUADARRAMA S， et al. Long‑term recurrent convolutional networks for visual recognition and description［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 2625-2634. 10.1109/cvpr.2015.7298878
99	PARK D H， HENDRICKS L A， AKATA Z， et al. Multimodal explanations： justifying decisions and pointing to the evidence［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8779-8788. 10.1109/cvpr.2018.00915
100	CHEN L， YAN X， XIAO J， et al. Counterfactual samples synthesizing for robust visual question answering［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10797-10806. 10.1109/cvpr42600.2020.01081
101	ODAJIMA K， HAYASHI Y， TIANXIA G， et al. Greedy rule generation from discrete data and its use in neural network rule extraction［J］. Neural Networks， 2008， 21（7）： 1020-1028. 10.1016/j.neunet.2008.01.003
102	ZHANG Q， YANG Y， MA H， et al. Interpreting CNNs via decision trees［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 6261-6270. 10.1109/cvpr.2019.00642
103	LEI X， FAN Y K， LI K C， et al. High‑precision linearized interpretation for fully connected neural network［J］. Applied Soft Computing， 2021， 109： No.107572. 10.1016/j.asoc.2021.107572
104	HOOKER S， ERHAN D， KINDERMANS P J， et al. A benchmark for interpretability methods in deep neural networks［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-09-21］..
105	YANG M J， KIM B. Benchmarking attribution methods with relative feature importance［EB/OL］. （2019-11-04）［2021-05-01］..
106	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
107	ZHOU B L， LAPEDRIZA A， KHOSLA A， et al. Places： a 10 million image database for scene recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（6）： 1452-1464. 10.1109/tpami.2017.2723009
108	ALVAREZ‑MELIS D， JAAKKOLA T S. Towards robust interpretability with self‑explaining neural networks［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 7786-7795. 10.18653/v1/d17-1042
109	MOHSENI S， BLOCK J E， RAGAN E D. A human‑grounded evaluation benchmark for local explanations of machine learning［EB/OL］. （2020-06-28）［2021-05-16］.. 10.1145/3397481.3450689
110	HOLZINGER A， CARRINGTON A， MÜLLER H. Measuring the quality of explanations： the System Causability Scale （SCS）［J］. KI - Künstliche Intelligenz， 2020， 34（2）： 193-198. 10.1007/s13218-020-00636-z
111	EDMONDS M， GAO F， LIU H X， et al. A tale of two explanations： enhancing human trust by explaining robot behavior［J］. Science Robotics， 2019， 4（37）： No.aay4663. 10.1126/scirobotics.aay4663

解释方法	典型方法	优缺点
基于路径的方法	KPRN^［45］，PGPR^［48］	结构简单，可解释性强，但效率不高，不适合复杂逻辑推理
基于嵌入的方法	CF^［52］，Query2Box^［53］	准确率较高，可适用于更高级的逻辑推理，但结构比较复杂，解释不够直观

解释方法	典型方法	优缺点
基于路径的方法	KPRN^［45］，PGPR^［48］	结构简单，可解释性强，但效率不高，不适合复杂逻辑推理
基于嵌入的方法	CF^［52］，Query2Box^［53］	准确率较高，可适用于更高级的逻辑推理，但结构比较复杂，解释不够直观

典型方法	实验数据集（应用网络）	解释方法	优缺点
DeConvNet^［54］	Caltech-101， Caltech-256， PASCAL VOC 2012 （AlexNet）	可视化卷积神经网络各隐藏层的特征，并通过遮挡输入图像的不同区域并观察输出结果的变化，找到对模型决策影响最大的特征	通过可视化呈现隐层学习到的特征，解释直观，但并未对模型整体的决策做解释
影响函数^［55］	MNIST（Inception）	使用影响函数的方法得到模型预测结果主要依据的样本特征，并且通过实验展示了模型对决策特征的归因	理论严谨，计算得到改变一个训练数据之后对模型参数和模型预测的影响，但解释不够直观
预测差异分析^［56］	ImageNet（AlexNet GoogLeNetVGG）	通过找到每个输入特征的相关值来观察各个特征与模型决策之间的正相关和负相关，进而突出显示给定输入图像中提供支持或反对相应类的证据的区域	可同时得到正相关和负相关的解释，可视化呈现解释直观，但计算较复杂，效率不高
RISE^［57］	PASCAL VOC07，MSCOCO2014 ImageNet（ResNet50， VGG16）	基于随机输入采样的方法通过将输入图像与随机掩码逐元相乘得到的掩码图作为输入，然后对随机掩码进行加权平均得到解释图	在自动因果度量方面优于之前的解释方法，但不能解释视频和其他领域中复杂网络所做的决策

典型方法	实验数据集（应用网络）	解释方法	优缺点
DeConvNet^［54］	Caltech-101， Caltech-256， PASCAL VOC 2012 （AlexNet）	可视化卷积神经网络各隐藏层的特征，并通过遮挡输入图像的不同区域并观察输出结果的变化，找到对模型决策影响最大的特征	通过可视化呈现隐层学习到的特征，解释直观，但并未对模型整体的决策做解释
影响函数^［55］	MNIST（Inception）	使用影响函数的方法得到模型预测结果主要依据的样本特征，并且通过实验展示了模型对决策特征的归因	理论严谨，计算得到改变一个训练数据之后对模型参数和模型预测的影响，但解释不够直观
预测差异分析^［56］	ImageNet（AlexNet GoogLeNetVGG）	通过找到每个输入特征的相关值来观察各个特征与模型决策之间的正相关和负相关，进而突出显示给定输入图像中提供支持或反对相应类的证据的区域	可同时得到正相关和负相关的解释，可视化呈现解释直观，但计算较复杂，效率不高
RISE^［57］	PASCAL VOC07，MSCOCO2014 ImageNet（ResNet50， VGG16）	基于随机输入采样的方法通过将输入图像与随机掩码逐元相乘得到的掩码图作为输入，然后对随机掩码进行加权平均得到解释图	在自动因果度量方面优于之前的解释方法，但不能解释视频和其他领域中复杂网络所做的决策

解释目标	解释方法	典型方法
解释逻辑规则	决策树	分解法（CRED^［38］，DeepRED^［39］）；教学法（DecText^［42］，区域树正则化^［43］）
解释逻辑规则	KG	基于路径（KPRN^［45］，PGPR^［48］）；基于嵌入（CF^［52］，Query2box^［53］）
解释决策归因	特征归因	基于扰动（DeConvNet^［54］，影响函数^［55］，预测差异分析^［56］，RISE^［57］）梯度反向传播（Saliency Maps^［58］，导向反向传播^［59］，集成梯度^［60］，平滑梯度^［61］）类激活映射（CAM^［62］，Grad‑CAM^［63］，Guided Grad‑CAM^［64］）分层关联传播（Deep‑LIFT^［65］，LRP^［66］，输入不变性^［67］）基于代理模型（LIME^［68］，LEMNA^［69］，DLIME^［70］，MPS‑LIME^［71］，QLIME^［72］）
	概念归因	TCAVs^［73］，ACE^［74］，CaCE^［75］，ConceptSHAP^［76］
	样本归因	Prototype selection^［78］，MMD‑critic^［79］，ProtoPNet^［81］
解释内部结构表示	层的表示	DeConvNet^［54］，文献［82-83］
解释内部结构表示	神经元的表示	文献［84，86］，DeepMind^［85］，文献［87］