Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (3): 840-848.DOI: 10.11772/j.issn.1001-9081.2024091297
• Frontier research and typical applications of large models • Previous Articles Next Articles
Wei CHEN1,2, Changyong SHI1,2, Chuanxiang MA1,2()
Received:
2024-09-10
Revised:
2024-11-13
Accepted:
2024-11-18
Online:
2024-11-29
Published:
2025-03-10
Contact:
Chuanxiang MA
About author:
CHEN Wei, born in 2000, M. S. candidate. His research interests include multi-modal learning, data fusion.Supported by:
通讯作者:
马传香
作者简介:
陈维(2000—),男,湖北武汉人,硕士研究生,主要研究方向:多模态学习、数据融合基金资助:
CLC Number:
Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion[J]. Journal of Computer Applications, 2025, 45(3): 840-848.
陈维, 施昌勇, 马传香. 基于多模态数据融合的农作物病害识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 840-848.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024091297
数据集 | 图像数 | 文本数 | 实际有效的图像-文本对数 |
---|---|---|---|
PlantVillage | 55 446 | 55 446 | 55 326 |
AI Challenger 2018 | 36 258 | 36 258 | 35 200 |
PlantDoc | 2 580 | 2 580 | 2 302 |
Tab. 1 Dataset statistic information
数据集 | 图像数 | 文本数 | 实际有效的图像-文本对数 |
---|---|---|---|
PlantVillage | 55 446 | 55 446 | 55 326 |
AI Challenger 2018 | 36 258 | 36 258 | 35 200 |
PlantDoc | 2 580 | 2 580 | 2 302 |
方法 | PlantVillage | AI Challenger 2018 | ||||||
---|---|---|---|---|---|---|---|---|
准确率 | 精确率 | 召回率 | F1值 | 准确率 | 精确率 | 召回率 | F1值 | |
SMLP_ResNet18[ | 99.32 | 99.10 | 86.93 | — | — | — | ||
文献[ | 99.32 | 99.10 | 98.78 | 98.92 | 86.93 | 84.15 | 83.42 | 83.60 |
DCNN[ | 99.48 | — | — | 99.23 | — | — | — | — |
VGG-ICNN[ | 99.16 | — | — | — | — | — | — | — |
EWPRC-ResNet-t[ | — | — | — | — | 87.42 | 85.36 | 84.23 | 84.79 |
CDR-CLIP | 99.31 | 99.10 | 98.98 | 99.04 | 87.66 | 88.26 | 87.66 | 87.56 |
Tab. 2 Performance comparison of different methods on PlantVillage and AI Challenger 2018 datasets
方法 | PlantVillage | AI Challenger 2018 | ||||||
---|---|---|---|---|---|---|---|---|
准确率 | 精确率 | 召回率 | F1值 | 准确率 | 精确率 | 召回率 | F1值 | |
SMLP_ResNet18[ | 99.32 | 99.10 | 86.93 | — | — | — | ||
文献[ | 99.32 | 99.10 | 98.78 | 98.92 | 86.93 | 84.15 | 83.42 | 83.60 |
DCNN[ | 99.48 | — | — | 99.23 | — | — | — | — |
VGG-ICNN[ | 99.16 | — | — | — | — | — | — | — |
EWPRC-ResNet-t[ | — | — | — | — | 87.42 | 85.36 | 84.23 | 84.79 |
CDR-CLIP | 99.31 | 99.10 | 98.98 | 99.04 | 87.66 | 88.26 | 87.66 | 87.56 |
方法 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|
Faster-RCNN-MobileNet[ | 32.80 | — |
YOLOR-Light-v1[ | 42.70 | 32.70 |
TL-SE-ResNeXt-101[ | 47.37 | — |
文献[ | 48.20 | 33.30 |
DETR[ | 48.90 | 46.30 |
CDR-CLIP | 51.10 | 33.90 |
Tab. 3 Performance comparison of different methods on PlantDoc dataset
方法 | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|
Faster-RCNN-MobileNet[ | 32.80 | — |
YOLOR-Light-v1[ | 42.70 | 32.70 |
TL-SE-ResNeXt-101[ | 47.37 | — |
文献[ | 48.20 | 33.30 |
DETR[ | 48.90 | 46.30 |
CDR-CLIP | 51.10 | 33.90 |
数据集 | 文本模态 | 准确率 | 精确率 | 召回率 | F1值 |
---|---|---|---|---|---|
PlantVillage | × | 98.30 | 98.33 | 98.30 | 98.30 |
√ | 99.31 | 99.10 | 98.98 | 99.04 | |
AI Challenger 2018 | × | 83.17 | 84.29 | 83.17 | 82.81 |
√ | 87.66 | 88.26 | 87.66 | 87.56 |
Tab. 4 Results of ablation experiment on PlantVillage and AI Challenger 2018 datasets
数据集 | 文本模态 | 准确率 | 精确率 | 召回率 | F1值 |
---|---|---|---|---|---|
PlantVillage | × | 98.30 | 98.33 | 98.30 | 98.30 |
√ | 99.31 | 99.10 | 98.98 | 99.04 | |
AI Challenger 2018 | × | 83.17 | 84.29 | 83.17 | 82.81 |
√ | 87.66 | 88.26 | 87.66 | 87.56 |
1 | 康丽,袁建清,高睿,等. 高光谱成像的水稻稻瘟病早期分级检测[J]. 光谱学与光谱分析, 2021, 41(3):898-902. |
KANG L, YUAN J Q, GAO R, et al. Early detection and identification of rice blast based on hyperspectral image [J]. Spectroscopy and Spectral Analysis, 2021, 41(3): 898-902. | |
2 | JOHANNES A, PICON A, ALVAREZ-GILA A, et al. Automatic plant disease diagnosis using mobile capture devices, applied on a wheat use case [J]. Computers and Electronics in Agriculture, 2017, 138: 200-209. |
3 | AGARWAL M, GUPTA S K, BISWAS K K. Development of efficient CNN model for tomato crop disease identification [J]. Sustainable Computing: Informatics and Systems, 2020, 28: No.100407. |
4 | 赵恒谦,杨屹峰,刘泽龙,等. 农作物叶片病害迁移学习分步识别方法[J]. 测绘通报, 2021(7):34-38. |
ZHAO H Q, YANG Y F, LIU Z L, et al. Step-by-step identification method of crop leaf diseases based on transfer learning [J]. Bulletin of Surveying and Mapping, 2021(7): 34-38. | |
5 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2024-09-04]. . |
6 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
7 | 杜海顺,张春海,安文昊,等. 基于多层信息融合和显著性特征增强的农作物病害识别[J]. 农业机械学报, 2023, 54(7):214-222. |
DU H S, ZHANG C H, AN W H, et al. Crop disease recognition based on multi-layer information fusion and saliency feature enhancement [J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(7): 214-222. | |
8 | SCHWARZ SCHULER J P, ROMANI S, ABDEL-NASSER M, et al. Color-aware two-branch DCNN for efficient plant disease classification [J]. MENDEL, 2022, 28(1): 55-62. |
9 | THAKUR P S, SHEOREY T, OJHA A. VGG-ICNN: a lightweight CNN model for crop disease identification [J]. Multimedia Tools and Applications, 2023, 82(1): 497-520. |
10 | 姜红花,杨祥海,丁睿柔,等. 基于改进ResNet18的苹果叶部病害多分类算法研究[J]. 农业机械学报, 2023, 54(4):295-303. |
JIANG H H, YANG X H, DING R R, et al. Identification of apple leaf diseases based on improved ResNet18 [J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(4): 295-303. | |
11 | 黄林生,罗耀武,杨小冬,等. 基于注意力机制和多尺度残差网络的农作物病害识别[J]. 农业机械学报, 2021, 52(10):264-271. |
HUANG L S, LUO Y W, YANG X D, et al. Crop disease recognition based on attention mechanism and multi-scale residual network [J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(10): 264-271. | |
12 | 孙文斌,王荣,高荣华,等. 基于可见光谱和改进注意力的农作物病害识别[J]. 光谱学与光谱分析, 2022, 42(5):1572-1580. |
SUN W B, WANG R, GAO R H, et al. Crop disease recognition based on visible spectrum and improved attention module [J]. Spectroscopy and Spectral Analysis, 2022, 42(5): 1572-1580. | |
13 | 肖天赐,陈燕红,李永可,等. 基于改进通道注意力机制的农作物病害识别模型研究[J]. 江苏农业科学, 2023, 51(24):168-175. |
XIAO T C, CHEN Y H, LI Y K, et al. Study on crop disease identification model based on improved channel attention mechanism [J]. Jiangsu Agricultural Sciences, 2023, 51(24): 168-175. | |
14 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision [C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 8748-8763. |
15 | LI J, LI D, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models [C]// Proceedings of the 40th International Conference on Machine Learning. New York: JMLR.org, 2023: 19730-19742. |
16 | LIU H, LI C, WU Q, et al. Visual instruction tuning [C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2024: 34892-34916. |
17 | HUGHES D P, SALATHÉ M. An open access repository of images on plant health to enable the development of mobile disease diagnostics [EB/OL]. [2023-11-23]. . |
18 | SINGH D, JAIN N, JAIN P, et al. PlantDoc: a dataset for visual plant disease detection [C]// Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. New York: ACM, 2020: 249-253. |
19 | ILHARCO G, WORTSMAN M, WIGHTMAN R, et al. OpenCLIP[CP/OL]. [2023-07-23]. . |
20 | SUN Q, YU Q, CUI Y, et al. Emu: generative pretraining in multimodality [EB/OL]. [2024-07-30]. . |
21 | BIRD S. NLTK: the natural language toolkit [C]// Proceedings of the Interactive Presentation Sessions of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2006: 69-72. |
22 | SCHUHMAN A, BEAUMONT R, VENCU R, et al. LAION-5B: an open large-scale dataset for training next generation image-text models [C]// Proceedings of the 36th Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 25278-25294. |
23 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
24 | DONG Y, CORDONNIER J B, LOUKAS A. Attention is not all you need: pure attention loses rank doubly exponentially with depth [C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 2793-2803. |
25 | LIU Y, ZHANG X, DING J, et al. Knowledge-infused contrastive learning for urban imagery-based socioeconomic prediction [C]// Proceedings of the ACM Web Conference 2023. New York: ACM, 2023: 4150-4160. |
26 | LI T, XIN S, XI Y, et al. Predicting multi-level socioeconomic indicators from structural urban imagery [C]// Proceedings of the 31st ACM International Conference on Information and Knowledge Management. New York: ACM, 2022: 3282-3291. |
27 | LI Y, MAO H, GIRSHICK R, et al. Exploring plain vision transformer backbones for object detection [C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13669. Cham: Springer, 2022: 280-296. |
28 | HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. |
29 | 高荣华,白强,王荣,等. 改进注意力机制的多叉树网络多作物早期病害识别方法[J]. 计算机科学, 2022, 49(6A):363-369. |
GAO R H, BAI Q, WANG R, et al. Multi-tree network multi-crop early disease recognition method based on improved attention mechanism [J]. Computer Science, 2022, 49(6A): 363-369. | |
30 | HUANG Q, WU X, WANG Q, et al. Knowledge distillation facilitates the lightweight and efficient plant diseases detection model [J]. Plant Phenomics, 2023, 5: No.0062. |
31 | 王东方,汪军. 基于迁移学习和残差网络的农作物病害分类[J]. 农业工程学报, 2021, 37(4):199-207. |
WANG D F, WANG J. Crop disease classification with transfer learning and residual networks [J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(4): 199-207. | |
32 | LEE H, AHN S. Improving the performance of object detection by preserving label distribution [J]. Mathematics, 2023, 11(21): No.4460. |
33 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 213-229. |
[1] | Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN. Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model [J]. Journal of Computer Applications, 2025, 45(3): 815-822. |
[2] | Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731. |
[3] | Xiaolin QIN, Xu GU, Dicheng LI, Haiwen XU. Survey and prospect of large language models [J]. Journal of Computer Applications, 2025, 45(3): 685-696. |
[4] | Chengzhe YUAN, Guohua CHEN, Dingding LI, Yuan ZHU, Ronghua LIN, Hao ZHONG, Yong TANG. ScholatGPT: a large language model for academic social networks and its intelligent applications [J]. Journal of Computer Applications, 2025, 45(3): 755-764. |
[5] | Yuanlong WANG, Tinghua LIU, Hu ZHANG. Commonsense question answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 2025, 45(3): 732-738. |
[6] | Kun SHENG, Zhongqing WANG. Synaesthesia metaphor analysis based on large language model and data augmentation [J]. Journal of Computer Applications, 2025, 45(3): 794-800. |
[7] | Xuefei ZHANG, Liping ZHANG, Sheng YAN, Min HOU, Yubo ZHAO. Personalized learning recommendation in collaboration of knowledge graph and large language model [J]. Journal of Computer Applications, 2025, 45(3): 773-784. |
[8] | Jing HE, Yang SHEN, Runfeng XIE. Recognition and optimization of hallucination phenomena in large language models [J]. Journal of Computer Applications, 2025, 45(3): 709-714. |
[9] | Yuemei XU, Yuqi YE, Xueyi HE. Bias challenges of large language models: identification, evaluation, and mitigation [J]. Journal of Computer Applications, 2025, 45(3): 697-708. |
[10] | Yan YANG, Feng YE, Dong XU, Xuejie ZHANG, Jin XU. Construction of digital twin water conservancy knowledge graph integrating large language model and prompt learning [J]. Journal of Computer Applications, 2025, 45(3): 785-793. |
[11] | Chenwei SUN, Junli HOU, Xianggen LIU, Jiancheng LYU. Large language model prompt generation method for engineering drawing understanding [J]. Journal of Computer Applications, 2025, 45(3): 801-807. |
[12] | Yanmin DONG, Jiajia LIN, Zheng ZHANG, Cheng CHENG, Jinze WU, Shijin WANG, Zhenya HUANG, Qi LIU, Enhong CHEN. Design and practice of intelligent tutoring algorithm based on personalized student capability perception [J]. Journal of Computer Applications, 2025, 45(3): 765-772. |
[13] | Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361. |
[14] | Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427. |
[15] | Xiaosheng YU, Zhixin WANG. Sequential recommendation model based on multi-level graph contrastive learning [J]. Journal of Computer Applications, 2025, 45(1): 106-114. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||