基于多模态数据融合的农作物病害识别方法

doi:10.11772/j.issn.1001-9081.2024091297

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 840-848.DOI: 10.11772/j.issn.1001-9081.2024091297

• 大模型前沿研究与典型应用 • 上一篇下一篇

基于多模态数据融合的农作物病害识别方法

陈维¹^,², 施昌勇¹^,², 马传香¹^,²()

^1.湖北大学计算机与信息工程学院，武汉 430062
^2.大数据智能分析与行业应用湖北省重点实验室（湖北大学），武汉 430062

收稿日期:2024-09-10 修回日期:2024-11-13 接受日期:2024-11-18 发布日期:2024-11-29 出版日期:2025-03-10
通讯作者: 马传香
作者简介:陈维（2000—），男，湖北武汉人，硕士研究生，主要研究方向：多模态学习、数据融合
施昌勇（1999—），男，湖北潜江人，硕士研究生，主要研究方向：自然语言处理、数据融合
基金资助:
国家自然科学基金资助项目(62102136);湖北省技术创新计划重点项目(2024BAB034);湖北省技术创新计划重大项目(2024BAA008)

Crop disease recognition method based on multi-modal data fusion

Wei CHEN¹^,², Changyong SHI¹^,², Chuanxiang MA¹^,²()

^1.School of Computer Science and Information Engineering，Hubei University，Wuhan Hubei 430062，China
^2.Hubei Key Laboratory of Big Data Intelligent Analysis and Application （Hubei University），Wuhan Hubei 430062，China

Received:2024-09-10 Revised:2024-11-13 Accepted:2024-11-18 Online:2024-11-29 Published:2025-03-10
Contact: Chuanxiang MA
About author:CHEN Wei， born in 2000， M. S. candidate. His research interests include multi-modal learning， data fusion.
SHI Changyong， born in 1999， M. S. candidate. His research interests include natural language processing， data fusion.
Supported by:
National Natural Science Foundation of China(62102136);Hubei Province Key Technology Innovative Program(2024BAB034);Hubei Province Major Technological Innovation Project(2024BAA008)

摘要/Abstract

摘要：

现有的基于深度学习模型的农作物病害识别方法依赖特定农作物病害图像数据集进行图像特征学习，而忽视了文本特征在辅助图像特征学习中的重要性。为了更有效地提高模型对农作物病害图像的特征提取能力及病害识别能力，提出一种基于对比语言-图像预训练和多模态数据融合的农作物病害识别方法（CDR-CLIP）。首先，构建高质量的病害识别图像-文本对数据集，利用文本信息增强农作物病害图像的特征表示；其次，利用多模态融合策略有效结合文本特征与图像特征，以加强模型对病害的判别能力；最后，针对性地设计预训练和微调策略，从而优化模型在特定农作物病害识别任务中的表现。实验结果表明，在PlantVillage和AI Challenger 2018农作物病害数据集上，CDR-CLIP的病害识别准确率分别达到99.31%和87.66%，F1值分别达到99.04%和87.56%；在PlantDoc农作物病害数据集上，CDR-CLIP的平均精度均值mAP@0.5达到51.10%，展现出CDR-CLIP强大的性能优势。

关键词: 数据融合, 多模态, 大语言模型, 农作物病害识别, 对比学习

Abstract:

Current deep learning-based methods for crop disease recognition rely on specific image datasets of crop diseases for image representation learning， and do not consider the importance of text features in assisting image feature learning. To enhance feature extraction and disease recognition capabilities of the model for crop disease images more effectively， a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training （CDR-CLIP） was proposed. Firstly， high-quality disease recognition image-text pair datasets were constructed to enhance image feature representation through textual information. Then， a multi-modal fusion strategy was applied to integrate text and image features effectively， which strengthened the model capability of distinguishing diseases. Finally， specialized pre-training and fine-tuning strategies were designed to optimize the model’s performance in specific crop disease recognition tasks. Experimental results demonstrate that CDR-CLIP achieves the disease recognition accuracies of 99.31% and 87.66% with F1 values of 99.04% and 87.56%， respectively， on PlantVillage and AI Challenger 2018 crop disease datasets. On PlantDoc dataset， CDR-CLIP achieves the mean Average Precision mAP@0.5 of 51.10%， showing the strong performance advantage of CDR-CLIP.

Key words: data fusion, multi-modal, large language model, crop disease recognition, contrastive learning

中图分类号:

TP391.4

陈维, 施昌勇, 马传香. 基于多模态数据融合的农作物病害识别方法[J]. 计算机应用, 2025, 45(3): 840-848.

Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion[J]. Journal of Computer Applications, 2025, 45(3): 840-848.

图/表 12

图1 CDR-CLIP基本框架

Fig. 1 Basic framework of CDR-CLIP

图2 文本生成与修正示例

Fig. 2 Example of text generation and refinement

图3 各模块的详细结构

Fig. 3 Detailed structure of each module

图4 实验数据集中的部分病害图像

Fig. 4 Some disease images in experimental datasets

表1 数据集统计信息

Tab. 1 Dataset statistic information

数据集	图像数	文本数	实际有效的图像-文本对数
PlantVillage	55 446	55 446	55 326
AI Challenger 2018	36 258	36 258	35 200
PlantDoc	2 580	2 580	2 302

表2 不同方法在PlantVillage和AI Challenger 2018数据集上的性能对比 (%)

Tab. 2 Performance comparison of different methods on PlantVillage and AI Challenger 2018 datasets

方法	PlantVillage				AI Challenger 2018
方法	准确率	精确率	召回率	F1值	准确率	精确率	召回率	F1值
SMLP_ResNet18^［12］	99.32	99.10			86.93	—	—	—
文献［29］方法	99.32	99.10	98.78	98.92	86.93	84.15	83.42	83.60
DCNN^［8］	99.48	—	—	99.23	—	—	—	—
VGG-ICNN^［9］	99.16	—	—	—	—	—	—	—
EWPRC-ResNet-t^［13］	—	—	—	—	87.42	85.36	84.23	84.79
CDR-CLIP	99.31	99.10	98.98	99.04	87.66	88.26	87.66	87.56

表3 不同方法在PlantDoc数据集上的性能对比 (%)

Tab. 3 Performance comparison of different methods on PlantDoc dataset

方法	mAP@0.5	mAP@0.5：0.95
Faster-RCNN-MobileNet^［18］	32.80	—
YOLOR-Light-v1^［30］	42.70	32.70
TL-SE-ResNeXt-101^［31］	47.37	—
文献［32］方法	48.20	33.30
DETR^［33］	48.90	46.30
CDR-CLIP	51.10	33.90

图5 PlantVillage和AI Challenger 2018数据集上的损失下降对比

Fig. 5 Comparison of loss reduction on PlantVillage and AI Challenger 2018 datasets

表4 PlantVillage和AI Challenger 2018数据集上的消融实验结果 (%)

Tab. 4 Results of ablation experiment on PlantVillage and AI Challenger 2018 datasets

数据集	文本模态	准确率	精确率	召回率	F1值
PlantVillage	×	98.30	98.33	98.30	98.30
PlantVillage	√	99.31	99.10	98.98	99.04
AI Challenger 2018	×	83.17	84.29	83.17	82.81
AI Challenger 2018	√	87.66	88.26	87.66	87.56

图6 PlantDoc数据集上的部分样本的可视化结果

Fig. 6 Visualization results of some examples on PlantDoc dataset

图7 AI Challenger 2018数据集上病害分类的混淆矩阵

Fig. 7 Confusion matrix of disease classification on AI Challenger 2018 dataset

图8 AI Challenger 2018数据集中不同类别图像的相似描述

Fig. 8 Similar descriptions of different categories of images in AI Challenger 2018 dataset

参考文献 33

1	康丽，袁建清，高睿，等. 高光谱成像的水稻稻瘟病早期分级检测［J］. 光谱学与光谱分析， 2021， 41（3）：898-902.
	KANG L， YUAN J Q， GAO R， et al. Early detection and identification of rice blast based on hyperspectral image ［J］. Spectroscopy and Spectral Analysis， 2021， 41（3）： 898-902.
2	JOHANNES A， PICON A， ALVAREZ-GILA A， et al. Automatic plant disease diagnosis using mobile capture devices， applied on a wheat use case ［J］. Computers and Electronics in Agriculture， 2017， 138： 200-209.
3	AGARWAL M， GUPTA S K， BISWAS K K. Development of efficient CNN model for tomato crop disease identification ［J］. Sustainable Computing： Informatics and Systems， 2020， 28： No.100407.
4	赵恒谦，杨屹峰，刘泽龙，等. 农作物叶片病害迁移学习分步识别方法［J］. 测绘通报， 2021（7）：34-38.
	ZHAO H Q， YANG Y F， LIU Z L， et al. Step-by-step identification method of crop leaf diseases based on transfer learning ［J］. Bulletin of Surveying and Mapping， 2021（7）： 34-38.
5	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2024-09-04］. .
6	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
7	杜海顺，张春海，安文昊，等. 基于多层信息融合和显著性特征增强的农作物病害识别［J］. 农业机械学报， 2023， 54（7）：214-222.
	DU H S， ZHANG C H， AN W H， et al. Crop disease recognition based on multi-layer information fusion and saliency feature enhancement ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2023， 54（7）： 214-222.
8	SCHWARZ SCHULER J P， ROMANI S， ABDEL-NASSER M， et al. Color-aware two-branch DCNN for efficient plant disease classification ［J］. MENDEL， 2022， 28（1）： 55-62.
9	THAKUR P S， SHEOREY T， OJHA A. VGG-ICNN： a lightweight CNN model for crop disease identification ［J］. Multimedia Tools and Applications， 2023， 82（1）： 497-520.
10	姜红花，杨祥海，丁睿柔，等. 基于改进ResNet18的苹果叶部病害多分类算法研究［J］. 农业机械学报， 2023， 54（4）：295-303.
	JIANG H H， YANG X H， DING R R， et al. Identification of apple leaf diseases based on improved ResNet18 ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2023， 54（4）： 295-303.
11	黄林生，罗耀武，杨小冬，等. 基于注意力机制和多尺度残差网络的农作物病害识别［J］. 农业机械学报， 2021， 52（10）：264-271.
	HUANG L S， LUO Y W， YANG X D， et al. Crop disease recognition based on attention mechanism and multi-scale residual network ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2021， 52（10）： 264-271.
12	孙文斌，王荣，高荣华，等. 基于可见光谱和改进注意力的农作物病害识别［J］. 光谱学与光谱分析， 2022， 42（5）：1572-1580.
	SUN W B， WANG R， GAO R H， et al. Crop disease recognition based on visible spectrum and improved attention module ［J］. Spectroscopy and Spectral Analysis， 2022， 42（5）： 1572-1580.
13	肖天赐，陈燕红，李永可，等. 基于改进通道注意力机制的农作物病害识别模型研究［J］. 江苏农业科学， 2023， 51（24）：168-175.
	XIAO T C， CHEN Y H， LI Y K， et al. Study on crop disease identification model based on improved channel attention mechanism ［J］. Jiangsu Agricultural Sciences， 2023， 51（24）： 168-175.
14	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 8748-8763.
15	LI J， LI D， SAVARESE S， et al. BLIP-2： bootstrapping language-image pre-training with frozen image encoders and large language models ［C］// Proceedings of the 40th International Conference on Machine Learning. New York： JMLR.org， 2023： 19730-19742.
16	LIU H， LI C， WU Q， et al. Visual instruction tuning ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2024： 34892-34916.
17	HUGHES D P， SALATHÉ M. An open access repository of images on plant health to enable the development of mobile disease diagnostics ［EB/OL］. ［2023-11-23］. .
18	SINGH D， JAIN N， JAIN P， et al. PlantDoc： a dataset for visual plant disease detection ［C］// Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. New York： ACM， 2020： 249-253.
19	ILHARCO G， WORTSMAN M， WIGHTMAN R， et al. OpenCLIP［CP/OL］. ［2023-07-23］. .
20	SUN Q， YU Q， CUI Y， et al. Emu： generative pretraining in multimodality ［EB/OL］. ［2024-07-30］. .
21	BIRD S. NLTK： the natural language toolkit ［C］// Proceedings of the Interactive Presentation Sessions of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2006： 69-72.
22	SCHUHMAN A， BEAUMONT R， VENCU R， et al. LAION-5B： an open large-scale dataset for training next generation image-text models ［C］// Proceedings of the 36th Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 25278-25294.
23	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
24	DONG Y， CORDONNIER J B， LOUKAS A. Attention is not all you need： pure attention loses rank doubly exponentially with depth ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 2793-2803.
25	LIU Y， ZHANG X， DING J， et al. Knowledge-infused contrastive learning for urban imagery-based socioeconomic prediction ［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 4150-4160.
26	LI T， XIN S， XI Y， et al. Predicting multi-level socioeconomic indicators from structural urban imagery ［C］// Proceedings of the 31st ACM International Conference on Information and Knowledge Management. New York： ACM， 2022： 3282-3291.
27	LI Y， MAO H， GIRSHICK R， et al. Exploring plain vision transformer backbones for object detection ［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13669. Cham： Springer， 2022： 280-296.
28	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988.
29	高荣华，白强，王荣，等. 改进注意力机制的多叉树网络多作物早期病害识别方法［J］. 计算机科学， 2022， 49（6A）：363-369.
	GAO R H， BAI Q， WANG R， et al. Multi-tree network multi-crop early disease recognition method based on improved attention mechanism ［J］. Computer Science， 2022， 49（6A）： 363-369.
30	HUANG Q， WU X， WANG Q， et al. Knowledge distillation facilitates the lightweight and efficient plant diseases detection model ［J］. Plant Phenomics， 2023， 5： No.0062.
31	王东方，汪军. 基于迁移学习和残差网络的农作物病害分类［J］. 农业工程学报， 2021， 37（4）：199-207.
	WANG D F， WANG J. Crop disease classification with transfer learning and residual networks ［J］. Transactions of the Chinese Society of Agricultural Engineering， 2021， 37（4）： 199-207.
32	LEE H， AHN S. Improving the performance of object detection by preserving label distribution ［J］. Mathematics， 2023， 11（21）： No.4460.
33	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.

[1]	何静, 沈阳, 谢润锋. 大语言模型幻觉现象的识别与优化[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 709-714.
[2]	徐月梅, 叶宇齐, 何雪怡. 大语言模型的偏见挑战：识别、评估与去除[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 697-708.
[3]	杨燕, 叶枫, 许栋, 张雪洁, 徐津. 融合大语言模型和提示学习的数字孪生水利知识图谱构建[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 785-793.
[4]	盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 794-800.
[5]	鲁超峰, 陶冶, 文连庆, 孟菲, 秦修功, 杜永杰, 田云龙. 融合大语言模型和预训练模型的少量语料说话人-情感语音转换方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 815-822.
[6]	孙晨伟, 侯俊利, 刘祥根, 吕建成. 面向工程图纸理解的大语言模型提示生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 801-807.
[7]	董艳民, 林佳佳, 张征, 程程, 吴金泽, 王士进, 黄振亚, 刘淇, 陈恩红. 个性化学情感知的智慧助教算法设计与实践[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 765-772.
[8]	张学飞, 张丽萍, 闫盛, 侯敏, 赵宇博. 知识图谱与大语言模型协同的个性化学习推荐[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 773-784.
[9]	秦小林, 古徐, 李弟诚, 徐海文. 大语言模型综述与展望[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 685-696.
[10]	袁成哲, 陈国华, 李丁丁, 朱源, 林荣华, 钟昊, 汤庸. ScholatGPT：面向学术社交网络的大语言模型及智能应用[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 755-764.
[11]	王元龙, 刘亭华, 张虎. 基于跨模态对比学习的常识问答模型[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 732-738.
[12]	曹鹏, 温广琪, 杨金柱, 陈刚, 刘歆一, 季学纯. 面向测试用例生成的大模型高效微调方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 725-731.
[13]	蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 421-427.
[14]	杨晟, 李岩. 面向目标检测的对比知识蒸馏方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 354-361.
[15]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.

基于多模态数据融合的农作物病害识别方法

Crop disease recognition method based on multi-modal data fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 33

相关文章 15

编辑推荐

Metrics