《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S1): 39-44.DOI: 10.11772/j.issn.1001-9081.2022091362

• 人工智能 • 上一篇    下一篇

基于BERT的图像和文本多模态融合分类模型

李佳欣, 苏曙光()   

  1. 华中科技大学 软件学院,武汉 430074
  • 收稿日期:2022-09-13 修回日期:2022-11-29 接受日期:2022-12-06 发布日期:2023-07-04 出版日期:2023-06-30
  • 通讯作者: 苏曙光
  • 作者简介:李佳欣(2000—),女,陕西咸阳人,硕士研究生,主要研究方向:机器学习
    苏曙光(1975—),男,湖南临湘人,副教授,博士,CCF会员,主要研究方向:机器学习。sueagle@163.com
  • 基金资助:
    武汉市科技计划项目(2019010701011385)

Image and text multi-modality fusion classification model based on BERT

Jiaxin LI, Shuguang SU()   

  1. School of Software Engineering,Huazhong University of Science and Technology,Wuhan Hubei 430074,China
  • Received:2022-09-13 Revised:2022-11-29 Accepted:2022-12-06 Online:2023-07-04 Published:2023-06-30
  • Contact: Shuguang SU

摘要:

在临床诊断过程中,医生会同时结合医学图像和病理报告文本综合判定病情。针对现有的人工智能(AI)辅助诊断系统未充分利用文本检查内容的问题,提出一种基于BERT模型的图文多模态分类模型(ITMMB),在特征层实现医学图像和病理文本的多模态融合和分类。采用残差网络(ResNet)对图像预处理获得图像词嵌入向量,同时采用分词技术处理文本获得文本嵌入词向量,并将两类嵌入词向量送入BERT模型完成最终分类;此外,为适应BERT模型需要并获得更好的分类性能,优化了ResNet的残差模块、学习权重、损失函数和池化层。在Open Images数据集上的实验结果表明,与仅通过单一的医学图像或病理文本进行辅助诊断的模型相比,ITMMB的微平均F1分数分别提高38.76和4.66个百分点,能有效辅助医生临床诊断。

关键词: 多模态融合, 残差网络, 图像分类, 文字分类, 特征提取, BERT

Abstract:

In the process of clinical diagnosis, both medical images and pathological report texts are used by doctors to judge the condition comprehensively. Aiming at the problem that the existing Artificial Intelligence (AI) aided diagnosis system do not make full use of the text inspection content, an Image-Text Multi-Modality model based on Bidirectional Encoder Representation from Transformer (ITMMB)was proposed to realize the multi-modality fusion and classification of medical images and pathological texts at the feature layer. Residual Network (ResNet) was adopted to pre-process the medical image to get image word embedded vectors. At the same time, word segmentation technology was adopted to process pathological text to get text word embedded vectors. Two types of word embedded vectors were both input into BERT model for final classification. In addition, residual module, learning weight, loss function and pool layer of ResNet were optimized for BERT requirement and better classification. Experimental results on Open Images dataset show that compared with aided diagnosis model only using medical images or pathological texts, ITMMB improves Micro average F1 score by 38.76 and 4.66 percentage points respectively, which can effectively improve the clinical diagnosis efficiency of doctors.

Key words: multi-modality fusion, Residual Network (ResNet), image classification, text classification, feature extraction, BERT(Bidirectional Encoder Representations from Transformer)

中图分类号: