基于多模态融合注意力的肝细胞癌疗效预测方法

doi:10.11772/j.issn.1001-9081.2023020252

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S2): 41-46.DOI: 10.11772/j.issn.1001-9081.2023020252

基于多模态融合注意力的肝细胞癌疗效预测方法

文含¹^,², 付忠良¹^,², 赵莹³, 姚宇¹^,², 刘爱连³^,⁴()

^1.中国科学院成都计算机应用研究所，成都 610213
^2.中国科学院大学，北京 100049
^3.大连医科大学附属第一医院，辽宁大连 116011
^4.大连市医学影像人工智能工程技术研究中心，辽宁大连 116011

收稿日期:2023-03-13 修回日期:2023-03-24 接受日期:2023-03-29 发布日期:2024-01-09 出版日期:2023-12-31
通讯作者: 刘爱连
作者简介:文含（1993—），男，重庆合川人，博士研究生，主要研究方向：机器学习、医学图像分析与应用
付忠良（1967—），男，重庆合川人，研究员，硕士，主要研究方向：机器学习
赵莹（1991—），女，辽宁辽阳人，主治医师，博士，主要研究方向：肝癌多模态MRI的人工智能
姚宇（1980—），男，四川宜宾人，研究员，博士，主要研究方向：机器学习、模式识别
刘爱连（1963—），女，辽宁大连人，教授，博士，主要研究方向：双能量CT及MRI新技术的临床应用、医学影像人工智能。
基金资助:
国家自然科学基金资助项目(61971091);四川省科技计划项目(2022YFS0384);大连市青年科技之星项目(2022RQ074);大连市医学科学研究计划项目(2212011)

Prediction method of hepatocellular carcinoma efficacy based on multimodal fusion attention

Han WEN¹^,², Zhongliang FU¹^,², Ying ZHAO³, Yu YAO¹^,², Ailian LIU³^,⁴()

^1.Chengdu Institute of Computer Applications，Chinese Academy of Sciences，Chengdu Sichuan 610213，China
^2.University of Chinese Academy of Sciences，Beijing 100049，China
^3.The First Affiliated Hospital of Dalian Medical University，Dalian Liaoning 116011，China
^4.Dalian Medical Imaging Artificial Intelligence Engineering Technology Research Center，Dalian Liaoning 116011，China

Received:2023-03-13 Revised:2023-03-24 Accepted:2023-03-29 Online:2024-01-09 Published:2023-12-31
Contact: Ailian LIU

摘要/Abstract

摘要：

针对传统方法预测肝细胞癌（HCC）疗效通常只采用图像信息、临床信息或基因信息等单一模态信息的问题，提出一种基于多模态融合注意力的HCC疗效预测方法。首先，使用残差网络（ResNet）提取图像特征和多层感知机提取临床特征和常规放射学特征；其次，构建一个多模态融合注意力模块，通过计算不同模态特征之间的相关性有效地融合图像特征、临床特征和常规放射学特征；最后，通过一个分类网络实现对HCC患者疗效的准确分类预测。实验结果表明，与单一模态预测疗效的方法相比，所提方法的准确率在实验数据集上提升了5.95个百分点，验证了所提方法能显著改善HCC患者疗效预测的结果。

关键词: 多模态融合, 注意力机制, 肝细胞癌, 疗效预测, 深度学习

Abstract:

A multimodal fusion attention-based HepatoCellular Carcinoma （HCC） efficacy prediction method was proposed to address the problem that traditional methods for predicting the efficacy of HCC usually use single modal information such as image information， clinical information， or genetic information. Firstly， Residual Network （ResNet） was used to extract image features， and a multi-layer perceptron was employed to extract clinical and conventional radiological features. Then， a multimodal fusion attention module was constructed to efficiently fuse image features， clinical features， and conventional radiological features by calculating the correlation between different modal features. Finally， a classification network was used to accurately predict and classify HCC patient outcomes. The experimental results show that compared to the single-modality method for predicting efficacy， the accuracy of the proposed method increases by 5.95 percentage points on experimental dataset， confirming that the proposed method can significantly improve the prediction results of HCC patient efficacy.

Key words: multimodal fusion, attention mechanism, HepatoCellular Carcinoma (HCC), efficacy prediction, deep learning

中图分类号:

TP391.41

文含, 付忠良, 赵莹, 姚宇, 刘爱连. 基于多模态融合注意力的肝细胞癌疗效预测方法[J]. 计算机应用, 2023, 43(S2): 41-46.

Han WEN, Zhongliang FU, Ying ZHAO, Yu YAO, Ailian LIU. Prediction method of hepatocellular carcinoma efficacy based on multimodal fusion attention[J]. Journal of Computer Applications, 2023, 43(S2): 41-46.

图/表 9

参考文献 25

1	SUNG H， FERLAY J， SIEGEL R L， et al. Global cancer statistics 2020： GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries ［J］. CA： a Cancer Journal for Clinicians， 2021， 71（3）： 209-249. 10.3322/caac.21660
2	EL-SERAG H B， RUDOLPH K L. Hepatocellular carcinoma： epidemiology and molecular carcinogenesis ［J］. Gastroenterology， 2007， 132（7）： 2557-2576. 10.1053/j.gastro.2007.04.061
3	ROSIAK G， PODGÓRSKA J， ROSIAK E， et al. CT/MRI LI-RADS v2017 — review of the guidelines ［J］. Polish Journal of Radiology， 2018， 83： 355-365. 10.5114/pjr.2018.78391
4	ZADEH A， CHEN M， PORIA S， et al. Tensor fusion network for multimodal sentiment analysis ［EB/OL］. （2017-07-23）［2019-09-26］. . 10.18653/v1/d17-1115
5	LIU Z， SHEN Y， LAKSHMINARASIMHAN V B， et al. Efficient low-rank multimodal fusion with modality-specific factors ［EB/OL］. （2018-05-31）［2019-10-26］. . 10.18653/v1/p18-1209
6	LI G， DUAN N， FANG Y， et al. Unicoder-VL： a universal encoder for vision and language by cross-modal pre-training ［C］// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park： AAAI， 2020： 11336-11344. 10.1609/aaai.v34i07.6795
7	WANG Y， SHEN Y， LIU Z， et al. Words can shift： dynamically adjusting word representations using nonverbal behaviors ［C］// Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park： AAAI， 2019： 7216-7223. 10.1609/aaai.v33i01.33017216
8	HAN Z， YANG F， HUANG J， et al. Multimodal dynamics： dynamical fusion for trustworthy multimodal classification ［C］// Proceedings of the 2022 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 20707-20717. 10.1109/cvpr52688.2022.02005
9	ZHU J， ZHOU Y， ZHANG J， et al. Multimodal summarization with guidance of multimodal reference ［C］// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. Menlo Park： AAAI， 2020： 9749-9756. 10.1609/aaai.v34i05.6525
10	YU Z， YU J， FAN J， et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1821-1830. 10.1109/iccv.2017.202
11	AREVALO J， SOLORIO T， MONTES-Y-GÓMEZ M， et al. Gated multimodal units for information fusion ［EB/OL］. （2017-02-07）［2019-06-08］. . 10.1007/s00521-019-04559-1
12	VANGURI R S， LUO J， AUKERMAN A T， et al. Multimodal integration of radiology， pathology and genomics for prediction of response to PD-（L）₁ blockade in patients with non-small cell lung cancer ［J］. Nature Cancer， 2022， 3（10）： 1151-1164. 10.1038/s43018-022-00416-8
13	SUBRAMANIAN V， DO M N， SYEDA-MAHMOOD T. Multimodal fusion of imaging and genomics for lung cancer recurrence prediction ［C］// Proceedings of the 2020 IEEE International Symposium on Biomedical Imaging. Piscataway： IEEE， 2020： 804-808. 10.1109/isbi45749.2020.9098545
14	LI R， WU X， LI A， et al. HFBSurv： hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction ［J］. Bioinformatics， 2022， 38（9）： 2587-2594. 10.1093/bioinformatics/btac113
15	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 2017 Annual Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2017： 6000-6010.
16	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803. 10.1109/cvpr.2018.00813
17	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
18	LI X， WANG W， HU X， et al. Selective kernel networks ［C］// Proceedings of the 2019 IEEE conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
19	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision. Berlin： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
20	FU J， LIU J， TIAN H， et al. Dual attention network for scene segmentation ［C］// Proceedings of the 2019 IEEE conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3146-3154. 10.1109/cvpr.2019.00326
21	HUANG Z， WANG X， HUANG L， et al. CCNet： criss-cross attention for semantic segmentation ［C］// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2019： 603-612. 10.1109/iccv.2019.00069
22	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
23	ISHIDA T， YAMANE I， SAKAI T， et al. Do we need zero training loss after achieving zero training error？［EB/OL］. （2021-03-31）［2022-04-13］. .
24	LENCIONI R， LLOVET J M. Modified RECIST （mRECIST） assessment for hepatocellular carcinoma ［J］. Seminars in Liver Disease， 2010， 30（1）： 52-60. 10.1055/s-0030-1247132
25	LUO L， XIONG Y， LIU Y， et al. Adaptive gradient methods with dynamic bound of learning rate ［EB/OL］. （2019-02-26）［2021-03-18］. . 10.48550/arXiv.1902.09843

层	输出大小	网络参数
卷积层1	48×48×32	卷积核大小7×7×7，个数64，步长2
最大池化层		池化核大小2×2×2，步长2
卷积层2	24×24×16	卷积核大小3×3×3，个数64 卷积核大小3×3×3，个数64
卷积层3	12×12×8	卷积核大小3×3×3，个数128 卷积核大小3×3×3，个数128
卷积层4	6×6×4	卷积核大小3×3×3，个数256 卷积核大小3×3×3，个数256
卷积层5	3×3×2	卷积核大小3×3×3，个数512 卷积核大小3×3×3，个数512
全连接层	1×1×1	2维

层	输出大小	网络参数
卷积层1	48×48×32	卷积核大小7×7×7，个数64，步长2
最大池化层		池化核大小2×2×2，步长2
卷积层2	24×24×16	卷积核大小3×3×3，个数64 卷积核大小3×3×3，个数64
卷积层3	12×12×8	卷积核大小3×3×3，个数128 卷积核大小3×3×3，个数128
卷积层4	6×6×4	卷积核大小3×3×3，个数256 卷积核大小3×3×3，个数256
卷积层5	3×3×2	卷积核大小3×3×3，个数512 卷积核大小3×3×3，个数512
全连接层	1×1×1	2维

方法	Acc	R	P	F1
CE-MRI	75.00	66.67	70.00	66.79
CE-MRI+Clin	79.76	74.76	68.26	79.80
CE-MR+Radiomic	77.38	72.38	78.54	73.26
CE-MRI+Clin+Radiomic	80.95	77.14	83.40	79.58

方法	Acc	R	P	F1
CE-MRI	75.00	66.67	70.00	66.79
CE-MRI+Clin	79.76	74.76	68.26	79.80
CE-MR+Radiomic	77.38	72.38	78.54	73.26
CE-MRI+Clin+Radiomic	80.95	77.14	83.40	79.58

方法	Acc	R	P	F1
Concat	77.38	84.70	71.72	77.14
TFN^［4］	77.38	87.15	71.17	77.03
LMF^［5］	76.19	86.67	68.06	75.27
MFB^［10］	78.57	78.57	66.73	71.26
SENet^［17］	75.00	83.88	67.23	72.49
GMU^［11］	80.95	81.91	75.12	77.58
本文方法	82.14	82.38	83.67	82.15

基于多模态融合注意力的肝细胞癌疗效预测方法

Prediction method of hepatocellular carcinoma efficacy based on multimodal fusion attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 25

相关文章 15

编辑推荐

Metrics

特征维度	Acc	R	P	F1
32	76.19	82.86	72.26	75.99
64	82.14	82.38	83.67	82.15
128	78.57	85.24	65.59	73.68
256	79.76	85.24	75.17	79.60

[1]	张昀枭, 吴晓红, 唐荔莉, 徐庆华, 王斌, 何小海. 基于多模态数据的阿尔兹海默病分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 298-305.
[2]	李向军, 王俊洪, 王诗璐, 陈金霞, 孙纪涛, 王建辉. 基于多模型并行融合网络的恶意流量检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 122-129.
[3]	李嘉元, 程江华, 刘通, 程榜, 潘乐昊. 基于密集连接的红外可见光图像融合方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 163-167.
[4]	殷兆鑫, 祁彦庆, 汪烈军. 基于YOLOXs的TERMINAL_PIN焊接面表面缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 209-215.
[5]	萧铂钿, 张绍兵, 成苗. 基于对照学习的图像缺陷检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 216-222.
[6]	周启宸, 王伯超. 基于改进YOLOv7的太阳能电池片表面缺陷检测[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 223-228.
[7]	石彬, 成苗, 张绍兵, 何莲. 半监督塑封烟盒退化图像修复算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 238-243.
[8]	莫桂棋, 夏益民, 邢延, 李卫军, 蔡述庭. 面向集成电路拥塞预测的版图数据扩充方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 261-267.
[9]	陈艳霞, 李鑫明, 王志勇, 于希娟, 闻宇, 夏时洪. 基于LSTM-CNN-Attention模型的电力设施非周期巡视决策方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 291-297.
[10]	谭朋柳, 张露玉, 徐光勇, 徐滕. 基于多粒度自注意力机制的抑郁症预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 34-40.
[11]	郝铎, 曾令飞, 李成伟. 基于变分模态分解和长短期记忆网络的大平移抖动电子稳像算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 168-175.
[12]	郎庆凯, 高方玉, 吴琼, 姚勇, 王道累. 基于改进YOLOv7的光伏组件红外图像热斑目标检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 191-195.
[13]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[14]	张涵钰, 李振波, 李蔚然, 杨普. 基于机器视觉的水产养殖计数研究综述[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2970-2982.
[15]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.