Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (4): 1108-1115.DOI: 10.11772/j.issn.1001-9081.2021071180
Special Issue: CCF第36届中国计算机应用大会 (CCF NCCA 2021)
• The 36 CCF National Conference of Computer Applications (CCF NCCA 2020) • Previous Articles Next Articles
Lie PAN1, Cheng ZENG1,2,3(), Haifeng ZHANG1, Chaodong WEN1, Rusong HAO1, Peng HE1,2,3
Received:
2021-07-08
Revised:
2021-08-27
Accepted:
2021-08-31
Online:
2021-09-08
Published:
2022-04-10
Contact:
Cheng ZENG
About author:
PAN Lie, born in 1997, M. S. candidate. His research interests include natural language processing, text classification.Supported by:
潘列1, 曾诚1,2,3(), 张海丰1, 温超东1, 郝儒松1, 何鹏1,2,3
通讯作者:
曾诚
作者简介:
潘列(1997—),男,湖北黄冈人,硕士研究生,主要研究方向:自然语言处理、文本分类基金资助:
CLC Number:
Lie PAN, Cheng ZENG, Haifeng ZHANG, Chaodong WEN, Rusong HAO, Peng HE. Text sentiment analysis method combining generalized autoregressive pre-training language model and recurrent convolutional neural network[J]. Journal of Computer Applications, 2022, 42(4): 1108-1115.
潘列, 曾诚, 张海丰, 温超东, 郝儒松, 何鹏. 结合广义自回归预训练语言模型与循环卷积神经网络的文本情感分析方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1108-1115.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071180
数据集 | 平均长度 | 正向情感文本数 | 反向情感文本数 | 样本总数 |
---|---|---|---|---|
wb | 80 | 59 994 | 59 996 | 119 990 |
wm10 | 40 | 4 001 | 7 987 | 11 988 |
Chn | 140 | 5 322 | 2 444 | 7 766 |
Tab. 1 Dataset statistical information
数据集 | 平均长度 | 正向情感文本数 | 反向情感文本数 | 样本总数 |
---|---|---|---|---|
wb | 80 | 59 994 | 59 996 | 119 990 |
wm10 | 40 | 4 001 | 7 987 | 11 988 |
Chn | 140 | 5 322 | 2 444 | 7 766 |
情感 | 数据处理前 | 数据处理后 |
---|---|---|
正向 | [花心][鼓掌]//@小懒猫Melody2011:[春暖花开] | [花心][鼓掌][春暖花开] |
负向 | 跳黄河吧//@懒人业余餐厅郎园店:亲,内测好么[衰] | 跳黄河吧 亲,内测好么[衰] |
负向 | 这次照片拍得不好不好。相机出问题了,刚去就发现了//@好吃 的顿号:小钟片片拍得果然靓,上菜问题偶已向老板反馈了! | 这次照片拍得不好不好。相机出问题了,刚去就发现了 小钟片片拍得果然靓,上菜问题偶已向老板反馈了! |
Tab. 2 Data cleaning result
情感 | 数据处理前 | 数据处理后 |
---|---|---|
正向 | [花心][鼓掌]//@小懒猫Melody2011:[春暖花开] | [花心][鼓掌][春暖花开] |
负向 | 跳黄河吧//@懒人业余餐厅郎园店:亲,内测好么[衰] | 跳黄河吧 亲,内测好么[衰] |
负向 | 这次照片拍得不好不好。相机出问题了,刚去就发现了//@好吃 的顿号:小钟片片拍得果然靓,上菜问题偶已向老板反馈了! | 这次照片拍得不好不好。相机出问题了,刚去就发现了 小钟片片拍得果然靓,上菜问题偶已向老板反馈了! |
模型 | 参数 | 值 |
---|---|---|
XLNet | 嵌入层尺寸 | 64 |
隐藏层尺寸 | 768 | |
隐藏层层数 | 12 | |
激活函数 | ReLU | |
RCNN | 隐藏层层数 | 100 |
神经网络 | BiGRU | |
池化方法 | MaxPool | |
dropout | 0.5 | |
XLNet-RCNN | 训练批次 | 100 |
迭代次数 | 20 | |
损失函数 | 交叉熵损失函数 | |
优化器 | Adam | |
learning rate | 1×10-5 |
Tab. 3 Model training parameters
模型 | 参数 | 值 |
---|---|---|
XLNet | 嵌入层尺寸 | 64 |
隐藏层尺寸 | 768 | |
隐藏层层数 | 12 | |
激活函数 | ReLU | |
RCNN | 隐藏层层数 | 100 |
神经网络 | BiGRU | |
池化方法 | MaxPool | |
dropout | 0.5 | |
XLNet-RCNN | 训练批次 | 100 |
迭代次数 | 20 | |
损失函数 | 交叉熵损失函数 | |
优化器 | Adam | |
learning rate | 1×10-5 |
数据集 | 模型 | 评价指标 | |||
---|---|---|---|---|---|
P | R | F1 | Acc | ||
wb | TextCNN | 87.6 | 87.5 | 87.4 | 87.5 |
TextRCNN | 88.6 | 88.5 | 88.4 | 88.5 | |
BERT | 95.1 | 95.0 | 95.0 | 95.0 | |
XLNet | 96.0 | 95.9 | 95.9 | 95.9 | |
XLNet-CNN | 96.2 | 96.1 | 96.1 | 96.1 | |
XLNet-RCNN | 96.5 | 96.4 | 96.4 | 96.4 | |
wm10 | TextCNN | 85.5 | 85.6 | 85.5 | 85.6 |
TextRCNN | 84.4 | 84.6 | 84.2 | 84.6 | |
BERT | 88.7 | 88.3 | 87.9 | 88.3 | |
XLNet | 91.2 | 90.9 | 90.9 | 90.9 | |
XLNet-CNN | 91.3 | 91.9 | 91.9 | 91.9 | |
XLNet-RCNN | 91.7 | 91.8 | 91.8 | 91.8 | |
Chn | TextCNN | 81.8 | 81.7 | 81.7 | 81.7 |
TextRCNN | 82.3 | 82.7 | 82.7 | 82.7 | |
BERT | 87.3 | 86.9 | 86.9 | 86.9 | |
XLNet | 92.1 | 92.1 | 92.1 | 92.1 | |
XLNet-CNN | 92.1 | 92.2 | 92.2 | 92.2 | |
XLNet-RCNN | 92.8 | 92.9 | 92.9 | 92.9 |
Tab. 4 Evaluation results of models on three datasets
数据集 | 模型 | 评价指标 | |||
---|---|---|---|---|---|
P | R | F1 | Acc | ||
wb | TextCNN | 87.6 | 87.5 | 87.4 | 87.5 |
TextRCNN | 88.6 | 88.5 | 88.4 | 88.5 | |
BERT | 95.1 | 95.0 | 95.0 | 95.0 | |
XLNet | 96.0 | 95.9 | 95.9 | 95.9 | |
XLNet-CNN | 96.2 | 96.1 | 96.1 | 96.1 | |
XLNet-RCNN | 96.5 | 96.4 | 96.4 | 96.4 | |
wm10 | TextCNN | 85.5 | 85.6 | 85.5 | 85.6 |
TextRCNN | 84.4 | 84.6 | 84.2 | 84.6 | |
BERT | 88.7 | 88.3 | 87.9 | 88.3 | |
XLNet | 91.2 | 90.9 | 90.9 | 90.9 | |
XLNet-CNN | 91.3 | 91.9 | 91.9 | 91.9 | |
XLNet-RCNN | 91.7 | 91.8 | 91.8 | 91.8 | |
Chn | TextCNN | 81.8 | 81.7 | 81.7 | 81.7 |
TextRCNN | 82.3 | 82.7 | 82.7 | 82.7 | |
BERT | 87.3 | 86.9 | 86.9 | 86.9 | |
XLNet | 92.1 | 92.1 | 92.1 | 92.1 | |
XLNet-CNN | 92.1 | 92.2 | 92.2 | 92.2 | |
XLNet-RCNN | 92.8 | 92.9 | 92.9 | 92.9 |
文本数据 | 预测值 | 情感极性 |
---|---|---|
比预计时间快了好多,又快又好,好评!! | 0.986 | 正向 |
房间很冷且不隔音,服务一般,卫生间很小,房间整体舒适性在我住过的如家是最差的 | 0.089 | 负向 |
土豆泥以前很好吃的,不知为什么这次特别干?! | 0.178 | 负向 |
这个地段这个价位这个服务的酒店,算是很难得了。 | 0.869 | 正向 |
Tab. 5 Text sample test results
文本数据 | 预测值 | 情感极性 |
---|---|---|
比预计时间快了好多,又快又好,好评!! | 0.986 | 正向 |
房间很冷且不隔音,服务一般,卫生间很小,房间整体舒适性在我住过的如家是最差的 | 0.089 | 负向 |
土豆泥以前很好吃的,不知为什么这次特别干?! | 0.178 | 负向 |
这个地段这个价位这个服务的酒店,算是很难得了。 | 0.869 | 正向 |
1 | RAVI K, RAVI V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications[J]. Knowledge-Based Systems, 2015, 89:14-46. 10.1016/j.knosys.2015.06.015 |
2 | 朱晓霞,宋嘉欣,张晓缇. 基于主题挖掘技术的文本情感分析综述[J]. 情报理论与实践, 2019, 42(11):156-163. 10.16353/j.cnki.1000-7490.2019.11.025 |
ZHU X X, SONG J X, ZHANG X T. Review of text emotion analysis based on topic mining technology[J]. Information Studies: Theory and Application, 2019, 42(11):156-163. 10.16353/j.cnki.1000-7490.2019.11.025 | |
3 | 温超东,曾诚,任俊伟,等. 结合ALBERT和双向门控循环单元的专利文本分类[J]. 计算机应用, 2021, 41(2):407-412. |
WEN C D, ZENG C, REN J W, et al. Patent text classification based on ALBERT and bidirectional gated recurrent unit[J]. Journal of Computer Applications, 2021, 41(2):407-412. | |
4 | MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2013: 3111-3119. |
5 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07) [2021-05-06].. 10.3126/jiee.v3i1.34327 |
6 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543. 10.3115/v1/d14-1162 |
7 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018:2227-2237. 10.18653/v1/n18-1202 |
8 | RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2020-11-01]. . |
9 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. 10.18653/v1/n19-1423 |
10 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. 10.1016/s0262-4079(17)32358-8 |
11 | LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A Lite BERT for self-supervised learning of language representations[EB/OL]. (2020-02-09) [2020-02-13].. |
12 | 曾诚,温超东,孙瑜敏,等. 基于ALBERT-CRNN的弹幕文本情感分析[J]. 郑州大学学报(理学版), 2021, 53(3):1-8. |
ZENG C, WEN C D, SUN Y M, et al. Barrage text sentiment analysis based on ALBERT-CRNN[J]. Journal of Zhengzhou University (Science Edition), 2021, 53(3):1-8. | |
13 | DAI Z, YANG Z, YANG Y, et al. Transformer-XL: attentive language models beyond a fixed-length context[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 2978-2988. 10.18653/v1/p19-1285 |
14 | YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[C/OL]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. [2020-09-20].. 10.1145/3369985.3370025 |
15 | 王志涛,於志文,郭斌,等. 基于词典和规则集的中文微博情感分析[J]. 计算机工程与应用, 2015, 51(8):218-225. 10.3778/j.issn.1002-8331.1308-0187 |
WANG Z T, YU Z W, GUO B, et al. Sentiment analysis of Chinese micro blog based on lexicon and rule set[J]. Computer Engineering and Applications, 2015, 51(8):218-225. 10.3778/j.issn.1002-8331.1308-0187 | |
16 | 吴杰胜,陆奎. 基于多部情感词典和规则集的中文微博情感分析研究[J]. 计算机应用与软件, 2019, 36(9):93-99. 10.3969/j.issn.1000-386x.2019.09.017 |
WU J S, LU K. Chinese Weibo sentiment analysis based on multiple sentiment lexicons and rule sets[J]. Computer Applications and Software, 2019, 36(9):93-99. 10.3969/j.issn.1000-386x.2019.09.017 | |
17 | 黄进,阮彤,蒋锐权. 基于SVM结合依存句法的金融领域舆情分析[J]. 计算机工程与应用, 2015, 51(23):230-235. 10.3778/j.issn.1002-8331.1311-0180 |
HUANG J, RUAN T, JIANG R Q. Sentiment analysis in financial domain based on SVM with dependency syntax[J]. Computer Engineering and Applications, 2015, 51(23):230-235. 10.3778/j.issn.1002-8331.1311-0180 | |
18 | 邓君,孙绍丹,王阮,等. 基于Word2Vec和SVM的微博舆情情感演化分析[J]. 情报理论与实践, 2020, 43(8): 112-119. 10.1109/imcec51613.2021.9482219 |
DENG J, SUN S D, WANG R, et al. Evolution analysis of weibo public opinion emotion based on Word2Vec and SVM[J].Information Studies: Theory and Application, 2020, 43(8): 112-119. 10.1109/imcec51613.2021.9482219 | |
19 | SOCHER R, LIN C C Y, NG A Y, et al. Parsing natural scenes and natural language with recursive neural networks[C]// Proceedings of the 28th International Conference on Machine Learning. Madison, WI: Omnipress, 2011: 129-136. |
20 | KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. 10.3115/v1/d14-1181 |
21 | CHO K, van MERRIËNBOER B, GU̇LÇEHRE Ç, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1724-1734. 10.3115/v1/d14-1179 |
22 | DEY R, SALEM F M. Gate-variants of Gated Recurrent Unit (GRU) neural networks[C]// Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems. Piscataway: IEEE, 2017: 1597-1600. 10.1109/mwscas.2017.8053243 |
23 | LAI S W, XU L H, LIU K, et al. Recurrent convolutional neural networks for text classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2015: 2267-2273. 10.1609/aaai.v33i01.33017370 |
24 | CUI Y M, CHE W X, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[EB/ OL] (2020-11-02) [2021-05-05].. 10.18653/v1/2020.findings-emnlp.58 |
[1] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[2] | Tianci KE, Jianhua LIU, Shuihua SUN, Zhixiong ZHENG, Zijie CAI. Aspect-level sentiment analysis model combining strong association dependency and concise syntax [J]. Journal of Computer Applications, 2024, 44(6): 1786-1795. |
[3] | Longtao GAO, Nana LI. Aspect sentiment triplet extraction based on aspect-aware attention enhancement [J]. Journal of Computer Applications, 2024, 44(4): 1049-1057. |
[4] | Xianfeng YANG, Yilei TANG, Ziqiang LI. Aspect-level sentiment analysis model based on alternating‑attention mechanism and graph convolutional network [J]. Journal of Computer Applications, 2024, 44(4): 1058-1064. |
[5] | Lei GUO, Zhen JIA, Tianrui LI. Relational and interactive graph attention network for aspect-level sentiment analysis [J]. Journal of Computer Applications, 2024, 44(3): 696-701. |
[6] | Yanbo LI, Qing HE, Shunyi LU. Aspect sentiment triplet extraction integrating semantic and syntactic information [J]. Journal of Computer Applications, 2024, 44(10): 3275-3280. |
[7] | Junhao LUO, Yan ZHU. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis [J]. Journal of Computer Applications, 2024, 44(1): 79-85. |
[8] | Li’an CHEN, Yi GUO. Text sentiment analysis model based on individual bias information [J]. Journal of Computer Applications, 2024, 44(1): 145-151. |
[9] | Xinyue ZHANG, Rong LIU, Chiyu WEI, Ke FANG. Aspect-based sentiment analysis method with integrating prompt knowledge [J]. Journal of Computer Applications, 2023, 43(9): 2753-2759. |
[10] | Hongjun HENG, Dingcheng YANG. Knowledge enhanced aspect word interactive graph neural network [J]. Journal of Computer Applications, 2023, 43(8): 2412-2419. |
[11] | Zhixiong ZHENG, Jianhua LIU, Shuihua SUN, Ge XU, Honghui LIN. Aspect-based sentiment analysis model fused with multi-window local information [J]. Journal of Computer Applications, 2023, 43(6): 1796-1802. |
[12] | Dan XU, Hongfang GONG, Rongrong LUO. Aspect sentiment analysis with aspect item and context representation [J]. Journal of Computer Applications, 2023, 43(10): 3086-3092. |
[13] | LIU Huan, DOU Quansheng. Aspect-based sentiment analysis model embedding different neighborhood representations [J]. Journal of Computer Applications, 2023, 43(1): 37-44. |
[14] | LIU Hui, MA Xiang, ZHANG Linyu, HE Rujin. Aspect-based sentiment analysis model integrating match-LSTM network and grammatical distance [J]. Journal of Computer Applications, 2023, 43(1): 45-50. |
[15] | Hongjun HENG, Tianbao XU. Attention sentiment analysis model based on multi-scale convolution and gating mechanism [J]. Journal of Computer Applications, 2022, 42(9): 2674-2679. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||