Journal of Computer Applications

    Next Articles

Crop disease recognition method based on multimodal data fusion#br#
#br#

CHEN Wei1,2, SHI Changyong1,2, MA Chuanxiang1,2   

  1. 1.College of Computer Science and Information Engineering, Hubei University 2.College of Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University)
  • Received:2024-09-09 Revised:2024-11-18 Online:2024-11-29 Published:2024-11-29
  • About author:CHEN Wei, born in 2000, M. S. candidate. His research interests include multi-modal learning, data fusion. SHI Changyong, born in 1999, M. S. candidate. His research interests include natural language processing, data fusion. MA Chuanxiang, born in 1971, Ph. D., professor. Her research interests include data mining, data fusion.
  • Supported by:
    National Natural Science Foundation of China (62102136), Hubei Province Major Technological Innovative Plan Project (2024BAB034), Hubei Province Key Technological Innovation Plan Project (2024BAA008).

基于多模态数据融合的农作物病害识别方法

陈维1,2,施昌勇1,2,马传香1,2   

  1. 1.湖北大学 计算机与信息工程学院 2.大数据智能分析与行业应用湖北省重点实验室(湖北大学)
  • 通讯作者: 马传香
  • 作者简介:陈维(2000—),男,湖北武汉人,硕士研究生,主要研究方向:多模态学习、数据融合;施昌勇(1999—),男,湖北潜江人,硕士研究生,主要研究方向:自然语言处理、数据融合;马传香(1971—),女,湖北荆州人,教授,硕士生导师,博士,主要研究方向:数据挖掘、数据融合。
  • 基金资助:
    国家自然科学基金资助项目(62102136);湖北省技术创新计划重点项目(2024BAB034);湖北省技术创新计划重大项目(2024BAA008)

Abstract: Current deep learning-based methods for crop disease recognition have typically relied on specific image datasets of crop diseases for representation learning, with limited consideration given to the potential of text features in assisting image feature learning. To enhance the feature extraction and disease identification capabilities of crop disease images more effectively, a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training (CDR-CLIP) was proposed. Firstly, high-quality disease detection image-text paired datasets were constructed to enhance image feature representation through textual information. Additionally, a multi-modal fusion strategy was applied to effectively integrate text and image features, which strengthened the model’s capability to distinguish diseases. Finally, specialized pre-training and fine-tuning strategies were also designed to optimize the model’s performance on specific crop disease identification tasks. Experimental results demonstrate that CDR-CLIP achieves recognition accuracies of 99.31% and 87.66% on the PlantVillage and AI Challenger 2018 crop disease datasets, respectively, with F1 scores of 99.04% and 87.56%. On the PlantDoc dataset, CDR-CLIP achieves mean Average Precision (mAP@0.5) of 51.10%, highlighting its strong performance advantage.

Key words: data fusion, multimodal, large language model, crop disease recognition, contrastive learning

摘要: 现有的基于深度学习模型的农作物病害识别方法,依赖特定农作物病害图像数据集进行图像特征学习,忽视了文本特征在辅助图像特征学习中的重要性。为更有效地提高模型对农作物病害图像的特征提取能力及病害识别能力,提出一种基于对比语言-图像预训练和多模态数据融合的农作物病害识别方法(CDR-CLIP)。首先,通过构建高质量的病害识别图像-文本对数据集,利用文本信息增强农作物病害图像特征表示;其次,利用多模态融合策略将文本特征与图像特征进行有效结合,以加强模型对病害的判别能力;最后,设计了专门的预训练和微调策略,优化模型在特定农作物病害识别任务中的表现。实验结果表明,在PlantVillage和AI Challenger 2018农作物病害数据集上,CDR-CLIP的病害识别准确率分别达到99.31%和87.66%,F1值分别达到99.04%和87.56%;在PlantDoc农作物病害数据集上,CDR-CLIP的平均精度均值(mAP@0.5)达到51.10%,展现出强大的性能优势。

关键词: 数据融合, 多模态, 大语言模型, 农作物病害识别, 对比学习

CLC Number: