《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 840-848.DOI: 10.11772/j.issn.1001-9081.2024091297

• 大模型前沿研究与典型应用 • 上一篇    下一篇

基于多模态数据融合的农作物病害识别方法

陈维1,2, 施昌勇1,2, 马传香1,2()   

  1. 1.湖北大学 计算机与信息工程学院,武汉 430062
    2.大数据智能分析与行业应用湖北省重点实验室(湖北大学),武汉 430062
  • 收稿日期:2024-09-10 修回日期:2024-11-13 接受日期:2024-11-18 发布日期:2024-11-29 出版日期:2025-03-10
  • 通讯作者: 马传香
  • 作者简介:陈维(2000—),男,湖北武汉人,硕士研究生,主要研究方向:多模态学习、数据融合
    施昌勇(1999—),男,湖北潜江人,硕士研究生,主要研究方向:自然语言处理、数据融合
  • 基金资助:
    国家自然科学基金资助项目(62102136);湖北省技术创新计划重点项目(2024BAB034);湖北省技术创新计划重大项目(2024BAA008)

Crop disease recognition method based on multi-modal data fusion

Wei CHEN1,2, Changyong SHI1,2, Chuanxiang MA1,2()   

  1. 1.School of Computer Science and Information Engineering,Hubei University,Wuhan Hubei 430062,China
    2.Hubei Key Laboratory of Big Data Intelligent Analysis and Application (Hubei University),Wuhan Hubei 430062,China
  • Received:2024-09-10 Revised:2024-11-13 Accepted:2024-11-18 Online:2024-11-29 Published:2025-03-10
  • Contact: Chuanxiang MA
  • About author:CHEN Wei, born in 2000, M. S. candidate. His research interests include multi-modal learning, data fusion.
    SHI Changyong, born in 1999, M. S. candidate. His research interests include natural language processing, data fusion.
  • Supported by:
    National Natural Science Foundation of China(62102136);Hubei Province Key Technology Innovative Program(2024BAB034);Hubei Province Major Technological Innovation Project(2024BAA008)

摘要:

现有的基于深度学习模型的农作物病害识别方法依赖特定农作物病害图像数据集进行图像特征学习,而忽视了文本特征在辅助图像特征学习中的重要性。为了更有效地提高模型对农作物病害图像的特征提取能力及病害识别能力,提出一种基于对比语言-图像预训练和多模态数据融合的农作物病害识别方法(CDR-CLIP)。首先,构建高质量的病害识别图像-文本对数据集,利用文本信息增强农作物病害图像的特征表示;其次,利用多模态融合策略有效结合文本特征与图像特征,以加强模型对病害的判别能力;最后,针对性地设计预训练和微调策略,从而优化模型在特定农作物病害识别任务中的表现。实验结果表明,在PlantVillage和AI Challenger 2018农作物病害数据集上,CDR-CLIP的病害识别准确率分别达到99.31%和87.66%,F1值分别达到99.04%和87.56%;在PlantDoc农作物病害数据集上,CDR-CLIP的平均精度均值mAP@0.5达到51.10%,展现出CDR-CLIP强大的性能优势。

关键词: 数据融合, 多模态, 大语言模型, 农作物病害识别, 对比学习

Abstract:

Current deep learning-based methods for crop disease recognition rely on specific image datasets of crop diseases for image representation learning, and do not consider the importance of text features in assisting image feature learning. To enhance feature extraction and disease recognition capabilities of the model for crop disease images more effectively, a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training (CDR-CLIP) was proposed. Firstly, high-quality disease recognition image-text pair datasets were constructed to enhance image feature representation through textual information. Then, a multi-modal fusion strategy was applied to integrate text and image features effectively, which strengthened the model capability of distinguishing diseases. Finally, specialized pre-training and fine-tuning strategies were designed to optimize the model’s performance in specific crop disease recognition tasks. Experimental results demonstrate that CDR-CLIP achieves the disease recognition accuracies of 99.31% and 87.66% with F1 values of 99.04% and 87.56%, respectively, on PlantVillage and AI Challenger 2018 crop disease datasets. On PlantDoc dataset, CDR-CLIP achieves the mean Average Precision mAP@0.5 of 51.10%, showing the strong performance advantage of CDR-CLIP.

Key words: data fusion, multi-modal, large language model, crop disease recognition, contrastive learning

中图分类号: