Crop disease recognition method based on multi-modal data fusion

doi:10.11772/j.issn.1001-9081.2024091297

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 840-848.DOI: 10.11772/j.issn.1001-9081.2024091297

• Frontier research and typical applications of large models • Previous Articles Next Articles

Crop disease recognition method based on multi-modal data fusion

Wei CHEN¹^,², Changyong SHI¹^,², Chuanxiang MA¹^,²()

^1.School of Computer Science and Information Engineering，Hubei University，Wuhan Hubei 430062，China
^2.Hubei Key Laboratory of Big Data Intelligent Analysis and Application （Hubei University），Wuhan Hubei 430062，China

Received:2024-09-10 Revised:2024-11-13 Accepted:2024-11-18 Online:2024-11-29 Published:2025-03-10
Contact: Chuanxiang MA
About author:CHEN Wei， born in 2000， M. S. candidate. His research interests include multi-modal learning， data fusion.
SHI Changyong， born in 1999， M. S. candidate. His research interests include natural language processing， data fusion.
Supported by:
National Natural Science Foundation of China(62102136);Hubei Province Key Technology Innovative Program(2024BAB034);Hubei Province Major Technological Innovation Project(2024BAA008)

基于多模态数据融合的农作物病害识别方法

陈维¹^,², 施昌勇¹^,², 马传香¹^,²()

^1.湖北大学计算机与信息工程学院，武汉 430062
^2.大数据智能分析与行业应用湖北省重点实验室（湖北大学），武汉 430062

通讯作者: 马传香
作者简介:陈维（2000—），男，湖北武汉人，硕士研究生，主要研究方向：多模态学习、数据融合
施昌勇（1999—），男，湖北潜江人，硕士研究生，主要研究方向：自然语言处理、数据融合
基金资助:
国家自然科学基金资助项目(62102136);湖北省技术创新计划重点项目(2024BAB034);湖北省技术创新计划重大项目(2024BAA008)

Abstract

Abstract:

Current deep learning-based methods for crop disease recognition rely on specific image datasets of crop diseases for image representation learning， and do not consider the importance of text features in assisting image feature learning. To enhance feature extraction and disease recognition capabilities of the model for crop disease images more effectively， a Crop Disease Recognition method through multi-modal data fusion based on Contrastive Language-Image Pre-training （CDR-CLIP） was proposed. Firstly， high-quality disease recognition image-text pair datasets were constructed to enhance image feature representation through textual information. Then， a multi-modal fusion strategy was applied to integrate text and image features effectively， which strengthened the model capability of distinguishing diseases. Finally， specialized pre-training and fine-tuning strategies were designed to optimize the model’s performance in specific crop disease recognition tasks. Experimental results demonstrate that CDR-CLIP achieves the disease recognition accuracies of 99.31% and 87.66% with F1 values of 99.04% and 87.56%， respectively， on PlantVillage and AI Challenger 2018 crop disease datasets. On PlantDoc dataset， CDR-CLIP achieves the mean Average Precision mAP@0.5 of 51.10%， showing the strong performance advantage of CDR-CLIP.

Key words: data fusion, multi-modal, large language model, crop disease recognition, contrastive learning

摘要：

现有的基于深度学习模型的农作物病害识别方法依赖特定农作物病害图像数据集进行图像特征学习，而忽视了文本特征在辅助图像特征学习中的重要性。为了更有效地提高模型对农作物病害图像的特征提取能力及病害识别能力，提出一种基于对比语言-图像预训练和多模态数据融合的农作物病害识别方法（CDR-CLIP）。首先，构建高质量的病害识别图像-文本对数据集，利用文本信息增强农作物病害图像的特征表示；其次，利用多模态融合策略有效结合文本特征与图像特征，以加强模型对病害的判别能力；最后，针对性地设计预训练和微调策略，从而优化模型在特定农作物病害识别任务中的表现。实验结果表明，在PlantVillage和AI Challenger 2018农作物病害数据集上，CDR-CLIP的病害识别准确率分别达到99.31%和87.66%，F1值分别达到99.04%和87.56%；在PlantDoc农作物病害数据集上，CDR-CLIP的平均精度均值mAP@0.5达到51.10%，展现出CDR-CLIP强大的性能优势。

关键词: 数据融合, 多模态, 大语言模型, 农作物病害识别, 对比学习

CLC Number:

TP391.4

Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion[J]. Journal of Computer Applications, 2025, 45(3): 840-848.

陈维, 施昌勇, 马传香. 基于多模态数据融合的农作物病害识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 840-848.

Figures/Tables 12

References 33

1	康丽，袁建清，高睿，等. 高光谱成像的水稻稻瘟病早期分级检测［J］. 光谱学与光谱分析， 2021， 41（3）：898-902.
	KANG L， YUAN J Q， GAO R， et al. Early detection and identification of rice blast based on hyperspectral image ［J］. Spectroscopy and Spectral Analysis， 2021， 41（3）： 898-902.
2	JOHANNES A， PICON A， ALVAREZ-GILA A， et al. Automatic plant disease diagnosis using mobile capture devices， applied on a wheat use case ［J］. Computers and Electronics in Agriculture， 2017， 138： 200-209.
3	AGARWAL M， GUPTA S K， BISWAS K K. Development of efficient CNN model for tomato crop disease identification ［J］. Sustainable Computing： Informatics and Systems， 2020， 28： No.100407.
4	赵恒谦，杨屹峰，刘泽龙，等. 农作物叶片病害迁移学习分步识别方法［J］. 测绘通报， 2021（7）：34-38.
	ZHAO H Q， YANG Y F， LIU Z L， et al. Step-by-step identification method of crop leaf diseases based on transfer learning ［J］. Bulletin of Surveying and Mapping， 2021（7）： 34-38.
5	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2024-09-04］. .
6	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
7	杜海顺，张春海，安文昊，等. 基于多层信息融合和显著性特征增强的农作物病害识别［J］. 农业机械学报， 2023， 54（7）：214-222.
	DU H S， ZHANG C H， AN W H， et al. Crop disease recognition based on multi-layer information fusion and saliency feature enhancement ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2023， 54（7）： 214-222.
8	SCHWARZ SCHULER J P， ROMANI S， ABDEL-NASSER M， et al. Color-aware two-branch DCNN for efficient plant disease classification ［J］. MENDEL， 2022， 28（1）： 55-62.
9	THAKUR P S， SHEOREY T， OJHA A. VGG-ICNN： a lightweight CNN model for crop disease identification ［J］. Multimedia Tools and Applications， 2023， 82（1）： 497-520.
10	姜红花，杨祥海，丁睿柔，等. 基于改进ResNet18的苹果叶部病害多分类算法研究［J］. 农业机械学报， 2023， 54（4）：295-303.
	JIANG H H， YANG X H， DING R R， et al. Identification of apple leaf diseases based on improved ResNet18 ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2023， 54（4）： 295-303.
11	黄林生，罗耀武，杨小冬，等. 基于注意力机制和多尺度残差网络的农作物病害识别［J］. 农业机械学报， 2021， 52（10）：264-271.
	HUANG L S， LUO Y W， YANG X D， et al. Crop disease recognition based on attention mechanism and multi-scale residual network ［J］. Transactions of the Chinese Society for Agricultural Machinery， 2021， 52（10）： 264-271.
12	孙文斌，王荣，高荣华，等. 基于可见光谱和改进注意力的农作物病害识别［J］. 光谱学与光谱分析， 2022， 42（5）：1572-1580.
	SUN W B， WANG R， GAO R H， et al. Crop disease recognition based on visible spectrum and improved attention module ［J］. Spectroscopy and Spectral Analysis， 2022， 42（5）： 1572-1580.
13	肖天赐，陈燕红，李永可，等. 基于改进通道注意力机制的农作物病害识别模型研究［J］. 江苏农业科学， 2023， 51（24）：168-175.
	XIAO T C， CHEN Y H， LI Y K， et al. Study on crop disease identification model based on improved channel attention mechanism ［J］. Jiangsu Agricultural Sciences， 2023， 51（24）： 168-175.
14	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 8748-8763.
15	LI J， LI D， SAVARESE S， et al. BLIP-2： bootstrapping language-image pre-training with frozen image encoders and large language models ［C］// Proceedings of the 40th International Conference on Machine Learning. New York： JMLR.org， 2023： 19730-19742.
16	LIU H， LI C， WU Q， et al. Visual instruction tuning ［C］// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2024： 34892-34916.
17	HUGHES D P， SALATHÉ M. An open access repository of images on plant health to enable the development of mobile disease diagnostics ［EB/OL］. ［2023-11-23］. .
18	SINGH D， JAIN N， JAIN P， et al. PlantDoc： a dataset for visual plant disease detection ［C］// Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. New York： ACM， 2020： 249-253.
19	ILHARCO G， WORTSMAN M， WIGHTMAN R， et al. OpenCLIP［CP/OL］. ［2023-07-23］. .
20	SUN Q， YU Q， CUI Y， et al. Emu： generative pretraining in multimodality ［EB/OL］. ［2024-07-30］. .
21	BIRD S. NLTK： the natural language toolkit ［C］// Proceedings of the Interactive Presentation Sessions of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2006： 69-72.
22	SCHUHMAN A， BEAUMONT R， VENCU R， et al. LAION-5B： an open large-scale dataset for training next generation image-text models ［C］// Proceedings of the 36th Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 25278-25294.
23	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
24	DONG Y， CORDONNIER J B， LOUKAS A. Attention is not all you need： pure attention loses rank doubly exponentially with depth ［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 2793-2803.
25	LIU Y， ZHANG X， DING J， et al. Knowledge-infused contrastive learning for urban imagery-based socioeconomic prediction ［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 4150-4160.
26	LI T， XIN S， XI Y， et al. Predicting multi-level socioeconomic indicators from structural urban imagery ［C］// Proceedings of the 31st ACM International Conference on Information and Knowledge Management. New York： ACM， 2022： 3282-3291.
27	LI Y， MAO H， GIRSHICK R， et al. Exploring plain vision transformer backbones for object detection ［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13669. Cham： Springer， 2022： 280-296.
28	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988.
29	高荣华，白强，王荣，等. 改进注意力机制的多叉树网络多作物早期病害识别方法［J］. 计算机科学， 2022， 49（6A）：363-369.
	GAO R H， BAI Q， WANG R， et al. Multi-tree network multi-crop early disease recognition method based on improved attention mechanism ［J］. Computer Science， 2022， 49（6A）： 363-369.
30	HUANG Q， WU X， WANG Q， et al. Knowledge distillation facilitates the lightweight and efficient plant diseases detection model ［J］. Plant Phenomics， 2023， 5： No.0062.
31	王东方，汪军. 基于迁移学习和残差网络的农作物病害分类［J］. 农业工程学报， 2021， 37（4）：199-207.
	WANG D F， WANG J. Crop disease classification with transfer learning and residual networks ［J］. Transactions of the Chinese Society of Agricultural Engineering， 2021， 37（4）： 199-207.
32	LEE H， AHN S. Improving the performance of object detection by preserving label distribution ［J］. Mathematics， 2023， 11（21）： No.4460.
33	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.

数据集	图像数	文本数	实际有效的图像-文本对数
PlantVillage	55 446	55 446	55 326
AI Challenger 2018	36 258	36 258	35 200
PlantDoc	2 580	2 580	2 302

数据集	图像数	文本数	实际有效的图像-文本对数
PlantVillage	55 446	55 446	55 326
AI Challenger 2018	36 258	36 258	35 200
PlantDoc	2 580	2 580	2 302

方法	mAP@0.5	mAP@0.5：0.95
Faster-RCNN-MobileNet^［18］	32.80	—
YOLOR-Light-v1^［30］	42.70	32.70
TL-SE-ResNeXt-101^［31］	47.37	—
文献［32］方法	48.20	33.30
DETR^［33］	48.90	46.30
CDR-CLIP	51.10	33.90

方法	mAP@0.5	mAP@0.5：0.95
Faster-RCNN-MobileNet^［18］	32.80	—
YOLOR-Light-v1^［30］	42.70	32.70
TL-SE-ResNeXt-101^［31］	47.37	—
文献［32］方法	48.20	33.30
DETR^［33］	48.90	46.30
CDR-CLIP	51.10	33.90

数据集	文本模态	准确率	精确率	召回率	F1值
PlantVillage	×	98.30	98.33	98.30	98.30
PlantVillage	√	99.31	99.10	98.98	99.04
AI Challenger 2018	×	83.17	84.29	83.17	82.81
AI Challenger 2018	√	87.66	88.26	87.66	87.56

Crop disease recognition method based on multi-modal data fusion

基于多模态数据融合的农作物病害识别方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 33

Related Articles 15

Recommended Articles

Metrics

[1]	Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN. Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model [J]. Journal of Computer Applications, 2025, 45(3): 815-822.
[2]	Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731.
[3]	Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU. Survey and prospect of large language models [J]. Journal of Computer Applications, 2025, 45(3): 685-696.
[4]	Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG. ScholatGPT： a large language model for academic social networks and its intelligent applications [J]. Journal of Computer Applications, 2025, 45(3): 755-764.
[5]	Yuanlong WANG, Tinghua LIU, Hu ZHANG. Commonsense question answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 2025, 45(3): 732-738.
[6]	Kun SHENG, Zhongqing WANG. Synaesthesia metaphor analysis based on large language model and data augmentation [J]. Journal of Computer Applications, 2025, 45(3): 794-800.
[7]	Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO. Personalized learning recommendation in collaboration of knowledge graph and large language model [J]. Journal of Computer Applications, 2025, 45(3): 773-784.
[8]	Jing HE, Yang SHEN, Runfeng XIE. Recognition and optimization of hallucination phenomena in large language models [J]. Journal of Computer Applications, 2025, 45(3): 709-714.
[9]	Yuemei XU, Yuqi YE, Xueyi HE. Bias challenges of large language models： identification， evaluation， and mitigation [J]. Journal of Computer Applications, 2025, 45(3): 697-708.
[10]	Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning [J]. Journal of Computer Applications, 2025, 45(3): 785-793.
[11]	Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding [J]. Journal of Computer Applications, 2025, 45(3): 801-807.
[12]	Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN. Design and practice of intelligent tutoring algorithm based on personalized student capability perception [J]. Journal of Computer Applications, 2025, 45(3): 765-772.
[13]	Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361.
[14]	Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427.
[15]	Xiaosheng YU, Zhixin WANG. Sequential recommendation model based on multi-level graph contrastive learning [J]. Journal of Computer Applications, 2025, 45(1): 106-114.