Patent text classification based on ALBERT and bidirectional gated recurrent unit

doi:10.11772/j.issn.1001-9081.2020050730

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (2): 407-412.DOI: 10.11772/j.issn.1001-9081.2020050730

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles Next Articles

Patent text classification based on ALBERT and bidirectional gated recurrent unit

WEN Chaodong¹, ZENG Cheng^1,2,3, REN Junwei¹, ZHANG Yan^1,2,3

1. School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China;
2. Hubei Province Engineering Technology Research Center for Software Engineering, Wuhan Hubei 430062, China;
3. Hubei Engineering Research Center for Smart Government and Artificial Intelligence, Wuhan Hubei 430062, China

Received:2020-06-01 Revised:2020-07-22 Online:2021-02-10 Published:2020-08-14
Supported by:
This work is partially supported by the Surface Program of National Natural Science Foundation of China (61977021), the Youth Program of National Natural Science Foundation of China (61902114), the 2019 Hubei Special Project of Technology Innovation (2019ACA144).

结合ALBERT和双向门控循环单元的专利文本分类

温超东¹, 曾诚^1,2,3, 任俊伟¹, 张^1,2,3

1. 湖北大学计算机与信息工程学院, 武汉 430062;
2. 湖北省软件工程工程技术研究中心, 武汉 430062;
3. 湖北省智慧政务与人工智能应用工程研究中心, 武汉 430062

通讯作者: 曾诚
作者简介:温超东(1996-),男,湖北荆州人,硕士研究生,CCF学生会员,主要研究方向:自然语言处理、文本分类;曾诚(1976-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:人工智能、行业软件;任俊伟(1992-),男,湖北宜昌人,硕士研究生,主要研究方向:自然语言处理、推荐系统;张?(1973-),男,湖北宜昌人,博士,主要研究方向:人工智能、信息安全。
基金资助:
国家自然科学基金面上项目（61977021）；国家自然科学基金青年科学基金资助项目（61902114）；2019年湖北省技术创新专项（2019ACA144）。

Abstract

Abstract: With the rapid increase in the number of patent applications, the demand for automatic classification of patent text is increasing. Most of the existing patent text classification algorithms utilize methods such as Word2vec and Global Vectors (GloVe) to obtain the word vector representation of the text, while a lot of word position information is abandoned and the complete semantics of the text cannot be expressed. In order to solve these problems, a multilevel patent text classification model named ALBERT-BiGRU was proposed by combining ALBERT (A Lite BERT) and BiGRU (Bidirectional Gated Recurrent Unit). In this model, dynamic word vector pre-trained by ALBERT was used to replace the static word vector trained by traditional methods like Word2vec, so as to improve the representation ability of the word vector. Then, the BiGRU neural network model was used for training, which preserved the semantic association between long-distance words in the patent text to the greatest extent. In the effective verification on the patent text dataset published by State Information Center, compared with Word2vec-BiGRU and GloVe-BiGRU, the accuracy of ALBERT-BiGRU was increased by 9.1 percentage points and 10.9 percentage points respectively at the department level of patent text, and was increased by 9.5 percentage points and 11.2 percentage points respectively at the big class level. Experimental results show that ALBERT-BiGRU can effectively improve the classification effect of patent texts of different levels.

Key words: patent text, text classification, A Lite BERT (ALBERT), Bidirectional Gated Recurrent Unit (BiGRU), word vector

摘要： 随着专利申请数量的快速增长，对专利文本实现自动分类的需求与日俱增。现有的专利文本分类算法大都采用Word2vec和全局词向量（GloVe）等方式获取文本的词向量表示，舍弃了大量词语的位置信息且不能表示出文本的完整语义。针对上述问题，提出了一种结合ALBERT和双向门控循环单元（BiGRU）的多层级专利文本分类模型ALBERT-BiGRU。该模型使用ALBERT预训练的动态词向量代替传统Word2vec等方式训练的静态词向量，提升了词向量的表征能力；并使用BiGRU神经网络模型进行训练，最大限度保留了专利文本中长距离词之间的语义关联。在国家信息中心公布的专利数据集上进行有效性验证，与Word2vec-BiGRU和GloVe-BiGRU相比，ALBERT-BiGRU的准确率在专利文本的部级别分别提高了9.1个百分点和10.9个百分点，在大类级别分别提高了9.5个百分点和11.2个百分点。实验结果表明，ALBERT-BiGRU能有效提升不同层级专利文本的分类效果。

关键词: 专利文本, 文本分类, ALBERT, 双向门控循环单元, 词向量

CLC Number:

WEN Chaodong, ZENG Cheng, REN Junwei, ZHANG Yan. Patent text classification based on ALBERT and bidirectional gated recurrent unit[J]. Journal of Computer Applications, 2021, 41(2): 407-412.

温超东, 曾诚, 任俊伟, 张. 结合ALBERT和双向门控循环单元的专利文本分类[J]. 计算机应用, 2021, 41(2): 407-412.

References

[1] World Intellectual Property Organization. World intellectual property indicators 2019[EB/OL].[2020-05-18]. https://www.wipo.int/edocs/pubdocs/en/wipo_pub_941_2019.pdf.
[2] 陆晓蕾, 倪斌. 基于预训练语言模型的BERT-CNN多层级专利分类研究[EB/OL].[2020-04-06]. https://arxiv.org/ftp/arxiv/papers/1911/1911.06241.pdf. (LU X L,NI B. BERT-CNN:a hierarchical patent classifier based on a pre-trained language model[EB/OL].[2020-04-06]. https://arxiv.org/ftp/arxiv/papers/1911/1911.06241.pdf.)
[3] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2013:3111-3119.
[4] MIKOLOV T,CHEN K,CORRADO G,et al. Efficient estimation of word representations in vector space[EB/OL].[2019-08-06]. http://www.surdeanu.info/mihai/teaching/ista555-spring15/readings/mikolov2013.pdf.
[5] PENNINGTON J, SOCHER R, MANNING C. GloVe:global vectors for word representation[C]//Proceedings of the 2014 Conference of Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1532-1543.
[6] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[EB/OL].[2019-08-17]. https://arxiv.org/pdf/1802.05365.pdf.
[7] RADFORD A,NARASIMHAN K,SALIMANS T,et al. Improving language understanding with unsupervised learning[EB/OL].[2019-08-06]. https://cdn.openai.com/research-covers/languageunsupervised/language_understanding_paper.pdf.
[8] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[9] DEVLIN J,CHANG M W,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics,2019:4171-4186.
[10] LAN Z,CHEN M,GOODMAN S,et al. ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2020-02-13]. https://arxiv.org/pdf/1909.11942.pdf.
[11] 许甜华, 吴明礼. 一种基于TF-IDF的朴素贝叶斯算法改进[J]. 计算机技术与发展,2020,30(2):75-79.(XU T H,WU M L. An improved naive Bayes algorithm based on TF-IDF[J]. Computer Technology and Development,2020,30(2):75-79.)
[12] 李程雄, 丁月华, 文贵华. SVM-KNN组合改进算法在专利文本分类中的应用[J]. 计算机工程与应用,2006,42(20):193-195,212.(LI C X,DING Y H,WEN G H. Application of SVMKNN combination improvement algorithm on patent text classification[J]. Computer Engineering and Applications,2006, 42(20):193-195,212.)
[13] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference of Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751.
[14] MIKOLOV T,KOMBRINK S,BURGET L,et al. Extensions of recurrent neural network language model[C]//Proceedings of the 2011 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2011:5528-5531.
[15] SOCHER R,LIN C C Y,NG A Y,et al. Parsing natural scenes and natural language with recursive neural networks[C]//Proceedings of the 28th International Conference on Machine Learning. Madison,WI:Omnipress,2011:129-136.
[16] HOCHREITER S,BENGIO Y,FRASCONI P. Gradient flow in recurrent nets:the difficulty of learning long-term dependencies[M]//KOLEN J F,KREMER S C. A Field Guide to Dynamical Recurrent Networks. Piscataway:IEEE,2001:237-243.
[17] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
[18] DEY R,SALEM F M. Gate-variants of Gated Recurrent Unit (GRU) neural networks[C]//Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems. Piscataway:IEEE,2017:1597-1600.
[19] 薛金成, 姜迪, 吴建德. 基于LSTM-A深度学习的专利文本分类研究[J]. 通信技术,2019,52(12):2888-2892.(XUE J C, JIANG D,WU J D. Patent text classification based on long shortterm memory network and attention mechanism[J]. Communications Technology,2019,52(12):2888-2892.)
[20] 方炯焜, 陈平华, 廖文雄. 结合GloVe和GRU的文本分类模型[J]. 计算机工程与应用,2020,56(20):98-103.(FANG J K, CHEN P H,LIAO W X. Text classification model based on GloVe and GRU[J]. Computer Engineering and Applications,2020,56(20):98-103.)
[21] SUTSKEVER I,VINYALS O,LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:3104-3112.
[22] BAHDANAU D,CHO K,BENGIO Y,et al. Neural machine translation by jointly learning to align and translate[EB/OL].[2019-05-24]. https://arxiv.org/pdf/1409.0473.pdf.
[23] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference of Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1724-1734.
[24] BA J L,KIROS J R,HINTON G E. Layer normalization[EB/OL].[2019-05-26]. https://arxiv.org/pdf/1607.06450.pdf.
[25] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[26] YANG Z, DAI Z, YANG Y, et al. XLNet:generalized autoregressive pretraining for language understanding[EB/OL].[2020-05-26]. https://arxiv.org/pdf/1906.08237.pdf.

Patent text classification based on ALBERT and bidirectional gated recurrent unit

结合ALBERT和双向门控循环单元的专利文本分类

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	WANG Wei, ZHAO Erping, CUI Zhiyuan, SUN Hao. Disambiguation method of multi-feature fusion based on HowNet sememe and Word2vec word embedding representation [J]. Journal of Computer Applications, 2021, 41(8): 2193-2198.
[2]	ZHANG Yang, JIANG Minghu. Authorship identification of text based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(7): 1897-1901.
[3]	LAI Xuemei, TANG Hong, CHEN Hongyu, LI Shanshan. Multimodal sentiment analysis based on feature fusion of attention mechanism-bidirectional gated recurrent unit [J]. Journal of Computer Applications, 2021, 41(5): 1268-1274.
[4]	YIN Chunyong, HE Miao. Text classification based on improved capsule network [J]. Journal of Computer Applications, 2020, 40(9): 2525-2530.
[5]	LIAO Shenglan, YIN Shi, CHEN Xiaoping, ZHANG Bo, OUYANG Yu, ZHANG Heng. Intent recognition dataset for dialogue systems in power business [J]. Journal of Computer Applications, 2020, 40(9): 2549-2554.
[6]	ZHANG Xinyi, FENG Shimin, DING Enjie. Entity recognition and relation extraction model for coal mine [J]. Journal of Computer Applications, 2020, 40(8): 2182-2188.
[7]	WANG Minrui, GAO Shu, YUAN Ziyong, YUAN Lei. Sequence generation model with dynamic routing for multi-label text classification [J]. Journal of Computer Applications, 2020, 40(7): 1884-1890.
[8]	LI Ming, GUO Chenhao, CHEN Xing. Automatic annotation of visual deep neural network [J]. Journal of Computer Applications, 2020, 40(6): 1593-1600.
[9]	WANG Yue, WANG Mengxuan, ZHANG Sheng, DU Wen. Alarm text named entity recognition based on BERT [J]. Journal of Computer Applications, 2020, 40(2): 535-540.
[10]	ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2020, 40(12): 3485-3489.
[11]	WU Ting, CAO Chunping. Aspect level sentiment classification model with location weight and long-short term memory based on attention-over-attention [J]. Journal of Computer Applications, 2019, 39(8): 2198-2203.
[12]	CHEN Jie, SHAO Zhiqing, ZHANG Huanhuan, FEI Jiahui. Short text sentiment analysis based on parallel hybrid neural network model [J]. Journal of Computer Applications, 2019, 39(8): 2192-2197.
[13]	MENG Zhao, TIAN Shengwei, YU Long, WANG Ruijin. Regional bullying recognition based on joint hierarchical attentional network and independent recurrent neural network [J]. Journal of Computer Applications, 2019, 39(8): 2450-2455.
[14]	QIU Ningjia, CONG Lin, ZHOU Sicheng, WANG Peng, LI Yanfang. SVD-CNN barrage text classification algorithm combined with improved active learning [J]. Journal of Computer Applications, 2019, 39(3): 644-650.
[15]	ZHANG Kejun, LI Weinan, QIAN Rong, SHI Taimeng, JIAO Meng. Automatic text summarization scheme based on deep learning [J]. Journal of Computer Applications, 2019, 39(2): 311-315.