Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (6): 1580-1586.DOI: 10.11772/j.issn.1001-9081.2019111951

• Artificial intelligence • Previous Articles     Next Articles

Question classification of common crop disease question answering system based on BERT

YANG Guofeng1,2, YANG Yong1,2   

  1. 1. Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    2. Key Laboratory of Agricultural Big Agri-data, Ministry of Agriculture and Rural Areas, Beijing 100081, China
  • Received:2019-11-15 Revised:2020-01-04 Online:2020-06-10 Published:2020-06-18
  • Contact: YANG Yong,born in 1975,Ph. D.,associate research follow. His research interests include smart agriculture,agricultural information technology.
  • About author:YANG Guofeng,born in 1994,M. S. candidate. His research interests include text categorization,affective computing.YANG Yong,born in 1975,Ph. D.,associate research follow. His research interests include smart agriculture,agricultural information technology.
  • Supported by:
    Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII).

基于BERT的常见作物病害问答系统问句分类

杨国峰1,2, 杨勇1,2   

  1. 1.中国农业科学院 农业信息研究所,北京 100081
    2.农业农村部农业大数据重点实验室,北京 100081
  • 通讯作者: 杨勇(1975—)
  • 作者简介:杨国峰(1994—),男,重庆人,硕士研究生,CCF会员,主要研究方向:文本分类、情感计算。杨勇(1975—),男,江苏海门人,副研究员,博士,主要研究方向:智慧农业、农业信息技术。
  • 基金资助:
    中国农业科学院科技创新工程项目(CAAS-ASTIP-2016-AII)。

Abstract: As a key module of the question answering system, question classification is also a key factor that restricts the retrieval efficiency of the question answering system. Aiming at the problems of complicated semantic information and large differences of user questions in agricultural question answering system, in order to meet the needs of users to quickly and accurately obtain classification results of common crop disease questions, the question classification model of common crop disease question answering system based on Bidirectional Encoder Representations from Transformers (BERT) was constructed. Firstly, the question dataset was preprocessed. Then, Bidirectional-Long Short Term Memory (Bi-LSTM) self-attention network classification model, Transformer classification model and BERT-based fine-tuning classification model were constructed respectively, and the three models were used to extract information of questions and train question classification model. Finally, the BERT-based fine-tuning classification model was tested and the impact of dataset size on classification results was explored. The experimental results show that, the BERT-based fine-tuning common crop disease question classification model has the classification accuracy, precision, recall, weighted harmonic mean of accuracy and recall higher than those of the Bi-LSTM self-attention network classification model and the Transformer classification model by 2-5 percentage points respectively. On Common Crop Disease Question Dataset (CCDQD), it can obtain the highest accuracy of 92.46%, precision of 92.59%, recall of 91.26%, and weighted harmonic mean of accuracy and recall of 91.92%. The BERT-based fine-tuning classification model has advantages of simple structure, few parameters and fast speed, and can efficiently classify common crop disease questions accurately. So, it can be used as the question classification model for the common crop disease question answering system.

Key words: Natural Language Processing (NLP), Bidirectional Encoder Representations from Transformers (BERT), crop disease, question answering system, question classification

摘要: 问句分类作为问答系统的关键模块,也是制约问答系统检索效率的关键性因素。针对农业问答系统中用户问句语义信息复杂、差异大的问题,为了满足用户快速、准确地获取常见作物病害问句的分类结果的需求,构建了基于BERT的常见作物病害问答系统的问句分类模型。首先,对问句数据集进行预处理;然后,分别构建双向长短期记忆(Bi-LSTM)自注意力网络分类模型、Transformer分类模型和基于BERT的微调分类模型,并利用三种模型提取问句的信息,进行问句分类模型的训练;最后,对基于BERT的微调分类模型进行测试,同时探究数据集规模对分类结果的影响。实验结果表明,基于BERT的微调常见作物病害问句分类模型的分类准确率、精确率、召回率、精确率和召回率的加权调和平均值分别高于双向长短期记忆自注意力网络模型和Transformer分类模型2~5个百分点,在常见作物病害问句数据集(CCDQD)上能获得最高准确率92.46%,精确率92.59%,召回率91.26%,精确率和召回率的加权调和平均值91.92%。基于BERT的微调分类模型具有结构简单、训练参数少、训练速度快等特点,并能够高效地对常见作物病害问句准确分类,可以作为常见作物病害问答系统的问句分类模型。

关键词: 自然语言处理, BERT, 作物病害, 问答系统, 问句分类

CLC Number: