Text classification of agricultural news based on ERNIE+DPCNN+BiGRU

doi:10.11772/j.issn.1001-9081.2022040641

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1461-1466.DOI: 10.11772/j.issn.1001-9081.2022040641

• Artificial intelligence • Previous Articles Next Articles

Text classification of agricultural news based on ERNIE+DPCNN+BiGRU

Senqi YANG¹^,², Xuliang DUAN¹^,²(), Zhan XIAO¹^,², Songsong LANG¹^,², Zhiyong LI¹^,²

^1.College of Information Engineering，Sichuan Agricultural University，Ya'an Sichuan 625014，China
^2.Agricultural Information Engineering Laboratory，Sichuan Agricultural University，Ya'an Sichuan 625014，China

Received:2022-05-07 Revised:2022-07-15 Accepted:2022-07-22 Online:2022-08-12 Published:2023-05-10
Contact: Xuliang DUAN
About author:YANG Senqi， born in 1997， M. S. candidate. His research interests include natural language processing.
DUAN Xuliang， born in 1982， M. S.， associate professor. His research interests include smart agriculture， data mining， data cleaning.
XIAO Zhan， born in 2000， M. S. candidate. His research interests include natural language processing.
LANG Songsong， born in 1997， M. S. candidate. His research interests include computer vision， object detection.
LI Zhiyong， born in 1985， Ph. D.， associate professor. His research interests include agricultural information processing， intelligent decision-making.
Supported by:
Natural Science Foundation of Sichuan Province(2022NSFSC0172)

基于ERNIE+DPCNN+BiGRU的农业新闻文本分类

杨森淇¹^,², 段旭良¹^,²(), 肖展¹^,², 郎松松¹^,², 李志勇¹^,²

^1.四川农业大学信息工程学院，四川雅安 625014
^2.四川农业大学农业信息工程实验室，四川雅安 625014

通讯作者: 段旭良
作者简介:杨森淇（1997—），男，河北廊坊人，硕士研究生，主要研究方向：自然语言处理
段旭良（1982—），男，河北唐山人，副教授，硕士，主要研究方向：智慧农业、数据挖掘、数据清洗 duanxuliang@sicau.edu.cn
肖展（2000—），男，四川巴中人，硕士研究生，主要研究方向：自然语言处理
郎松松（1997—），男，四川达州人，硕士研究生，主要研究方向：计算机视觉、目标检测
李志勇（1985—），男，四川眉山人，副教授，博士，主要研究方向：农业信息处理、智能决策。
基金资助:
四川省自然科学基金资助项目(2022NSFSC0172)

Abstract

Abstract:

To address the problems of poor targeted performance， unclear classification and lack of datasets faced by agricultural news， an agricultural news classification model based on Enhanced Representation through kNowledge IntEgration （ERNIE）， Deep Pyramidal Convolutional Neural Network （DPCNN） and Bidirectional Gated Recurrent Unit （BiGRU）， called EGC， was proposed. The dataset was first encoded by using ERNIE， then the features of the news text were extracted simultaneously by using the improved DPCNN and BiGRU， and the features extracted were combined and the final results were obtained by Softmax. To make EGC model more suitable for applications in the field of agricultural news classification， the DPCNN was improved by reducing its convolution layers to preserve more features. Experimental results show that compared with ERNIE， the precision， recall and F1 score of the proposed EGC model are improved by 1.47， 1.29 and 1.42 percentage points， respectively， verifying that EGC is better than traditional classification models.

Key words: text classification of news, agricultural engineering, Enhanced Representation through kNowledge IntEgration (ERNIE), Deep Pyramid Convolutional Neural Network (DPCNN), Bidirectional Gated Recurrent Unit (BiGRU)

摘要：

针对农业新闻目前面临的针对性差、分类不清和数据集缺乏等问题，提出一种基于ERNIE（Enhanced Representation through kNowledge IntEgration）、深度金字塔卷积神经网络（DPCNN）和双向门控循环单元（BiGRU）的农业新闻分类模型——EGC。首先利用ERNIE对数据集进行编码，然后利用改进后的DPCNN和BiGRU同时提取新闻文本的特征，再将两者提取的特征进行拼合并经过Softmax得到最终结果。为了使EGC模型适用于农业新闻分类领域，对DPCNN进行改进，减少它的卷积层以保留更多特征。实验结果表明，与ERNIE相比，EGC模型的精确率、召回率和F1分数别提升了1.47、1.29和1.42个百分点，优于传统分类模型。

关键词: 新闻文本分类, 农业工程, ERNIE, 深度金字塔卷积神经网络, 双向门控循环单元

CLC Number:

TP183

Senqi YANG, Xuliang DUAN, Zhan XIAO, Songsong LANG, Zhiyong LI. Text classification of agricultural news based on ERNIE+DPCNN+BiGRU[J]. Journal of Computer Applications, 2023, 43(5): 1461-1466.

杨森淇, 段旭良, 肖展, 郎松松, 李志勇. 基于ERNIE+DPCNN+BiGRU的农业新闻文本分类[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1461-1466.

Figures/Tables 9

References 36

1	景丽，何婷婷. 基于改进TF-IDF和ABLCNN的中文文本分类模型［J］. 计算机科学， 2021， 48（11A）：170-175， 190. 10.11896/jsjkx.210100232
	JING L， HE T T. Chinese text classification model based on improved TF-IDF and ABLCNN［J］. Computer Science， 2021， 48（11A）：170-175， 190. 10.11896/jsjkx.210100232
2	WANG K， HAN S C， POON J. InducT-GCN： inductive graph convolutional networks for text classification［EB/OL］. ［2022-02-08］. . 10.1109/icpr56361.2022.9956075
3	WANG K Z， HAN S C， LONG S Q， et al. ME-GCN： multi-dimensional edge-embedded graph convolutional networks for semi-supervised text classification［EB/OL］. ［2022-04-23］. .
4	YANG C H H， QI J， CHEN S Y C， et al. When BERT meets quantum temporal convolution learning for text classification in heterogeneous computing［C］// Proceeding of the 2022 IEEE International Conference on Acoustics， Speech， and Signal Processing. Piscataway： IEEE， 2022： 8602-8606. 10.1109/icassp43922.2022.9746412
5	薛春香，张玉芳. 面向新闻领域的中文文本分类研究综述［J］. 图书情报工作， 2013， 57（14）：134-139. 10.7536/j.issn.0252-3116.2013.14.022
	XUE C X， ZHANG Y F. Research review on chinese text classification in the news field［J］. Library and Information Service， 2013， 57（14）： 134-139. 10.7536/j.issn.0252-3116.2013.14.022
6	谢志峰，吴佳萍，马利庄. 基于卷积神经网络的中文财经新闻分类方法［J］. 山东大学学报（工学版）， 2018， 48（3）： 34-39， 66. 10.6040/j.issn.1672-3961.0.2017.433
	XIE Z F， WU J P， MA L Z. Chinese financial news classification method based on convolutional neural network［J］. Journal of Shandong University （Engineering Science）， 2018， 48（3）： 34-39， 66. 10.6040/j.issn.1672-3961.0.2017.433
7	许英姿，任俊玲. 基于改进的加权补集朴素贝叶斯物流新闻［J］. 计算机工程与设计， 2022， 43（1）： 179-185.
	XU Y Z， REN J L. Naive Bayesian logistics news classification based on improved weighted complement［J］. Computer Engineering and Design， 2022， 43（1）： 179-185.
8	朱芳鹏，王晓峰. 面向船舶工业新闻的文本分类［J］. 电子测量与仪器学报， 2020， 34（1）：149-155.
	ZHU F P， WANG X F. Text classification for ship industry news［J］. Journal of Electronic Measurement and Instrumentation， 2020， 34（1）：149-155.
9	李超凡，马凯. 基于注意力机制结合CNN-BiLSTM模型的电子病历文本分类［J］. 科学技术与工程， 2022， 22（6）：2363-2370. 10.3969/j.issn.1671-1815.2022.06.028
	LI C F， MA K. Electronic medical record text classification based on attention mechanism combined with CNN-BiLSTM［J］. Science Technology and Engineering， 2022， 22（6）： 2363-2370. 10.3969/j.issn.1671-1815.2022.06.028
10	霍婷婷. 基于fastText的新闻文本分类研究及在农业新闻中的应用［D］. 长春：吉林大学， 2019：1-3.
	HUO T T. Research on fastText-based classification of news texts and its application in agricultural news［D］. Changchun： Jilin University， 2019：1-3
11	金宁，赵春江，吴华瑞，等. 基于BiGRU_MulCNN的农业问答问句分类技术研究［J］. 农业机械学报， 2020， 51（5）：199-206. 10.6041/j.issn.1000-1298.2020.05.022
	JIN N， ZHAO C J， WU H R， et al. Classification technology of agricultural questions based on BiGRU_MulCNN［J］. Transactions of the Chinese Society for Agricultural Machinery， 2020， 51（5）： 199-206. 10.6041/j.issn.1000-1298.2020.05.022
12	王郝日钦，吴华瑞，冯帅，等. 基于Attention_DenseCNN的水稻问答系统问句分类［J］. 农业机械学报， 2021， 52（7）：237-243. 10.6041/j.issn.1000-1298.2021.07.025
	WANG H R Q， WU H R， FENG S， et al. Classification technology of rice questions in question answering system based on Attention_DenseCNN［J］. Transactions of the Chinese Society for Agricultural Machinery， 2021， 52（7）： 237-243. 10.6041/j.issn.1000-1298.2021.07.025
13	LIU Y H，OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［EB/OL］. （2019-07-26）［2022-04-24］..
14	CUI Y M， CHE W X， LIU T， et al. Revisiting pre-trained models for Chinese natural language processing［C］// Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg， PA： ACL， 2020： 657-668. 10.18653/v1/2020.findings-emnlp.58
15	杨先凤，赵家和，李自强. 融合字注释的文本分类模型［J］. 计算机应用， 2022， 42（5）：1317-1323. 10.11772/j.issn.1001-9081.2021030489
	YANG X F， ZHAO J H， LI Z Q. Text classification model combining word annotations［J］. Journal of Computer Applications， 2022， 42（5）：1317-1323. 10.11772/j.issn.1001-9081.2021030489
16	彭玉芳，石进，徐浩，等. 基于BERT和分面分类的多标签的南海证据性数据分类研究［J］. 图书馆杂志， 2022， 41（5）：102-108.
	PENG Y F， SHI J， XU H， et al. Research on multi-label evidence data of the South China Sea classification based on BERT and faceted classification［J］. Library Journal， 2022， 41（5）： 102-108.
17	张海丰，曾诚，潘列，等. 结合BERT和特征投影网络的新闻主题文本分类方法［J］. 计算机应用， 2022， 42（4）：1116-1124.
	ZHANG H F， ZENG C， PAN L， et al. News topic text classification method based on BERT and feature projection network［J］. Journal of Computer Applications， 2022， 42（4）： 1116-1124.
18	CUI Y M， CHE W X， LIU T， et al. Pre-training with whole word masking for Chinese BERT［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2021， 29： 3504-3514. 10.1109/taslp.2021.3124365
19	SUN Y， WANG S H， LI Y K， et al. ERNIE： enhanced representation through knowledge integration［EB/OL］. （2019-04-19）［2022-04-24］..
20	陈杰，马静，李晓峰. 融合预训练模型文本特征的短文本分类方法［J］. 数据分析与知识发现， 2021， 5（9）：21-30.
	CHEN J， MA J， LI X F. Short-text classification method with text features from pre-trained models［J］. Data Analysis and Knowledge Discovery， 2021， 5（9）：21-30.
21	黄山成，韩东红，乔百友，等. 基于ERNIE2.0-BiLSTM-Attention的隐式情感分析方法［J］. 小型微型计算机系统， 2021， 42（12）：2485-2489.
	HUANG S C， HAN D H， QIAO B Y， et al. Implicit sentiment analysis method based on ERNIE2.0-BiLSTM-Attention［J］. Journal of Chinese Computer Systems， 2021， 42（12）：2485-2489.
22	喻航，李红莲，吕学强. 人大报告内容的文本分类［J］. 计算机工程与设计， 2021， 42（6）：1772-1778. 10.16208/j.issn1000-7024.2021.06.036
	YU H， LI H L， LYU X Q. Text classification of NPC report contents［J］. Computer Engineering and Design， 2021， 42（6）： 1772-1778. 10.16208/j.issn1000-7024.2021.06.036
23	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
24	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）：1735-1780. 10.1162/neco.1997.9.8.1735
25	CHO K， van MERRIËNBOER B， GU̇LÇEHRE Ç， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1724-1734. 10.3115/v1/d14-1179
26	JOHNSON R， ZHANG T. Deep pyramid convolutional neural networks for text categorization［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2017： 562-570. 10.18653/v1/p17-1052
27	沈彬，严馨，周丽华，等. 基于ERNIE和双重注意力机制的微博情感分析［J］. 云南大学学报（自然科学版）， 2022， 44（3）： 480-489.
	SHEN B， YAN X， ZHOU L H， et al. Microblog sentiment analysis based on ERNIE and dual attention mechanism［J］. Journal of Yunnan University （Natural Science Edition）， 2022， 44（3）： 480-489.
28	崔玉洁，廖坤. 借助八爪鱼采集器实现过刊网刊元数据的自动提取［J］. 编辑学报， 2016， 28（5）： 485-488.
	CUI Y J， LIAO K. Realization of automatic extraction of metadata in back issues of network journals by octopus collector［J］. Acta Editologica， 2016， 28（5）： 485-488.
29	李默涵，李建中. 数据时效性判定：关键理论和技术［J］. 智能计算机与应用， 2016， 6（6）：72-75. 10.3969/j.issn.2095-2163.2016.06.020
	LI M H， LI J Z. Data currency determination： key theories and technologies［J］. Intelligent Computer and Applications， 2016， 6（6）： 72-75. 10.3969/j.issn.2095-2163.2016.06.020
30	李建中，刘显敏. 大数据的一个重要方面：数据可用性［J］. 计算机研究与发展， 2013， 50（6）：1147-1162. 10.7544/issn1000-1239.2013.20130646
	LI J Z， LIU X M. An important aspect of big data： data usability［J］. Journal of Computer Research and Development， 2013， 50（6）： 1147-1162. 10.7544/issn1000-1239.2013.20130646
31	段旭良，郭兵，沈艳，等. 基于时效规则的数据修复方法［J］. 软件学报， 2019， 30（3）：589-603.
	DUAN X L， GUO B， SHEN Y， et al. Data repair algorithm based on currency rules［J］. Journal of Software， 2019， 30（3）： 589-603.
32	BERGER A， GUDA S. Threshold optimization for F measure of macro-averaged precision and recall［J］. Pattern Recognition， 2020， 102： No.107250. 10.1016/j.patcog.2020.107250
33	LIU M， CAI Z Q， CHEN J S. Adaptive two-layer ReLU neural network： I. best least-squares approximation［J］. Computers and Mathematics with Applications， 2022， 113： 34-44. 10.1016/j.camwa.2022.03.005
34	MANIATOPOULOS A， MITIANOUDIS N. Learnable Leaky ReLU （LeLeLU）： an alternative accuracy-optimized activation function［J］. Information， 2021， 12（12）： No.513. 10.3390/info12120513
35	BANERJEE C， MUKHERJEE T， PASILIAO E. Feature representations using the Reflected Rectified Linear Unit （RReLU） activation［J］. Big Data Mining and Analytics， 2020， 3（2）：102-120. 10.26599/bdma.2019.9020024
36	NAYEF B H， ABDULLAH S N H S， SULAIMAN R， et al. Optimized leaky ReLU for handwritten Arabic character recognition using convolution neural networks［J］. Multimedia Tools and Applications， 2022， 81（2）： 2065-2094. 10.1007/s11042-021-11593-6

类别	训练集	测试集	验证集	总计	平均长度
总计	12 778	1 384	1 386	15 548	18.98
渔业	2 297	258	258	2 813	21.01
林业	1 936	192	193	2 321	19.82
种植业	3 645	356	357	4 358	16.19
畜牧业	3 239	371	371	3 981	17.68
副业	1 661	207	207	2 075	20.20

类别	训练集	测试集	验证集	总计	平均长度
总计	12 778	1 384	1 386	15 548	18.98
渔业	2 297	258	258	2 813	21.01
林业	1 936	192	193	2 321	19.82
种植业	3 645	356	357	4 358	16.19
畜牧业	3 239	371	371	3 981	17.68
副业	1 661	207	207	2 075	20.20

模型	精确率	召回率	F1分数
BERT	0.673 8	0.684 2	0.655 4
RoBERTa	0.628 9	0.550 3	0.476 4
MacBERT	0.671 4	0.682 7	0.665 1
BERT+CNN	0.705 1	0.720 6	0.638 8
BERT+BiGRU	0.691 6	0.694 1	0.690 9
BERT+DPCNN	0.686 8	0.691 8	0.668 8
ERNIE	0.883 8	0.880 0	0.879 3
ERNIE+DPCNN	0.883 1	0.880 0	0.878 7
ERNIE+BiGRU	0.885 2	0.881 8	0.881 4
EGC	0.898 5	0.892 9	0.893 5

模型	精确率	召回率	F1分数
BERT	0.673 8	0.684 2	0.655 4
RoBERTa	0.628 9	0.550 3	0.476 4
MacBERT	0.671 4	0.682 7	0.665 1
BERT+CNN	0.705 1	0.720 6	0.638 8
BERT+BiGRU	0.691 6	0.694 1	0.690 9
BERT+DPCNN	0.686 8	0.691 8	0.668 8
ERNIE	0.883 8	0.880 0	0.879 3
ERNIE+DPCNN	0.883 1	0.880 0	0.878 7
ERNIE+BiGRU	0.885 2	0.881 8	0.881 4
EGC	0.898 5	0.892 9	0.893 5

激活函数	F1分数					权重平均
激活函数	渔业	林业	种植业	畜牧业	副业	权重平均
ReLU	0.873 1	0.905 1	0.885 8	0.946 6	0.770 0	0.893 5
Leaky ReLU	0.881 6	0.962 5	0.887 5	0.947 9	0.809 8	0.902 9
RReLU	0.877 8	0.932 2	0.888 6	0.948 9	0.807 8	0.903 4
PReLU	0.871 3	0.922 1	0.877 1	0.945 2	0.788 2	0.894 6

Text classification of agricultural news based on ERNIE+DPCNN+BiGRU

基于ERNIE+DPCNN+BiGRU的农业新闻文本分类

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 36

Related Articles 6

Recommended Articles

Metrics

[1]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[2]	Xianfeng YANG, Jiahe ZHAO, Ziqiang LI. Text classification model combining word annotations [J]. Journal of Computer Applications, 2022, 42(5): 1317-1323.
[3]	LAI Xuemei, TANG Hong, CHEN Hongyu, LI Shanshan. Multimodal sentiment analysis based on feature fusion of attention mechanism-bidirectional gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(5): 1268-1274.
[4]	WEN Chaodong, ZENG Cheng, REN Junwei, ZHANG Yan. Patent text classification based on ALBERT and bidirectional gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(2): 407-412.
[5]	CHEN Jie, SHAO Zhiqing, ZHANG Huanhuan, FEI Jiahui. Short text sentiment analysis based on parallel hybrid neural network model [J]. Journal of Computer Applications, 2019, 39(8): 2192-2197.
[6]	WANG Liya, LIU Changhui, CAI Dunbo, LU Tao. Chinese text sentiment analysis based on CNN-BiGRU network with attention mechanism [J]. Journal of Computer Applications, 2019, 39(10): 2841-2846.