联合分层注意力网络和独立循环神经网络的地域欺凌识别

doi:10.11772/j.issn.1001-9081.2019010033

计算机应用 ›› 2019, Vol. 39 ›› Issue (8): 2450-2455.DOI: 10.11772/j.issn.1001-9081.2019010033

• 应用前沿、交叉与综合 • 上一篇下一篇

联合分层注意力网络和独立循环神经网络的地域欺凌识别

孟曌¹, 田生伟¹, 禹龙², 王瑞锦³

1. 新疆大学软件学院, 乌鲁木齐 830008;
2. 新疆大学网络中心, 乌鲁木齐 830046;
3. 电子科技大学信息与软件工程学院, 成都 611731

收稿日期:2019-01-07 修回日期:2019-03-05 发布日期:2019-04-15 出版日期:2019-08-10
通讯作者: 田生伟
作者简介:孟曌(1994-),女,山西长治人,硕士研究生,主要研究方向:人工智能、自然语言处理;田生伟(1973-),男,新疆乌鲁木齐人,教授,博士生导师,博士,主要研究方向:人工智能、大数据分析、信息安全;禹龙(1974-),女,新疆乌鲁木齐人,教授,博士生导师,硕士,主要研究方向:网络空间、大数据分析、信息安全;王瑞锦(1980-),男,四川成都人,讲师,博士,主要研究方向:量子通信、大数据分析及安全。
基金资助:
国家自然科学基金资助项目（61662074，61563051，61262064）；国家自然科学基金重点项目（61331011）；新疆维吾尔自治区科技人才培养项目（QN2016YX0051）；天山青年计划项目（2017Q011）。

Regional bullying recognition based on joint hierarchical attentional network and independent recurrent neural network

MENG Zhao¹, TIAN Shengwei¹, YU Long², WANG Ruijin³

1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;;
2. Network Center, Xinjiang University, Urumqi Xinjiang 830046, China;;
3. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 611731, China

Received:2019-01-07 Revised:2019-03-05 Online:2019-04-15 Published:2019-08-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61563051, 61662074, 61262064), the Key Project of National Natural Science Foundation of China (61331011), the Xinjiang Uygur Autonomous Region Scientific and Technological Personnel Training Project (QN2016YX0051), the Xinjiang Tianshan Youth Plan Project (2017Q011).

摘要/Abstract

摘要： 为提高对文本语境深层次信息的利用效率，提出了联合分层注意力网络（HAN）和独立循环神经网络（IndRNN）的地域欺凌文本识别模型——HACBI。首先，将手工标注的地域欺凌文本通过词嵌入技术映射到低维向量空间中；其次，借助卷积神经网络（CNN）和双向长短期记忆网络（BiLSTM）提取地域欺凌文本的局部及全局语义特征，并进一步利用HAN捕获文本的内部结构信息；最后，为避免文本层次结构信息丢失和解决梯度消失等问题，引入IndRNN以增强模型的描述能力，并实现信息流的整合。实验结果表明，该模型的准确率（Acc）、精确率（P）、召回率（R）、F1和AUC值分别为99.57%、98.54%、99.02%、98.78%和99.35%，相比支持向量机（SVM）、CNN等文本分类模型有显著提升。

关键词: 地域欺凌, 结构信息, 分层注意力网络, 独立循环神经网络, 词向量, 语境

Abstract: In order to improve the utilization efficiency of deep information in text context, based on Hierarchical Attention Network (HAN) and Independent Recurrent Neural Network (IndRNN), a regional bullying semantic recognition model called HACBI (HAN_CNN_BiLSTM_IndRNN) was proposed. Firstly, the manually annotated regional bullying texts were mapped into a low-dimensional vector space by means of word embedding technology. Secondly, the local and global semantic information of bullying texts was extracted by using Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), and internal structure information of text was captured by HAN. Finally, in order to avoid the loss of text hierarchy information and solve the gradient disappearance problem, IndRNN was introduced to enhance the description ability of model, which achieved the integration of information flow. Experimental results show that the Accuracy (Acc), Precision (P), Recall (R), F1 (F1-Measure) and AUC (Area Under Curve) values are 99.57%, 98.54%, 99.02%, 98.78% and 99.35% respectively of this model, which indicates that the effectiveness provided by HACBI is significantly improved compared to text classification models such as Support Vector Machine (SVM) and CNN.

Key words: regional bullying, structural information, Hierarchical Attention Network (HAN), Independent Recurrent Neural Network (IndRNN), word vector, context

中图分类号:

TP391
TP181

孟曌, 田生伟, 禹龙, 王瑞锦. 联合分层注意力网络和独立循环神经网络的地域欺凌识别[J]. 计算机应用, 2019, 39(8): 2450-2455.

MENG Zhao, TIAN Shengwei, YU Long, WANG Ruijin. Regional bullying recognition based on joint hierarchical attentional network and independent recurrent neural network[J]. Journal of Computer Applications, 2019, 39(8): 2450-2455.

参考文献

[1] HU K, WU H, QI K, et al. A domain keyword analysis approach extending term frequency-keyword active index with Google Word2Vec model[J]. Scientometrics, 2018, 114(3):1031-1068.
[2] CHEN M, LIU W, YANG Z, et al. Automatic prosodic events detection using a two-stage SVM/CRF sequence classifier with acoustic features[C]//Proceedings of the 2012 Chinese Conference on Pattern Recognition, CCIS 321. Berlin:Springer, 2012:572-578.
[3] ASHKTORAB Z, HABER E, GOLBECK J, et al. Beyond cyberbullying:self-disclosure, harm and social support on ASKfm[C]//Proceedings of the 2017 ACM on Web Science Conference. New York:ACM, 2017:3-12.
[4] BURNAP P, COLOMBO G, AMERY R, et al. Multi-class machine classification of suicide-related communication on Twitter[J]. Online Social Networks and Media, 2017, 2:32-44.
[5] ZHOU Y T, DU Z G, ZHANG D, et al. Retrospective observational study about reducing the false negative rate of the sentinel lymph node biopsy:never underestimate the effect of subjective factors[J]. Medicine, 2017, 96(34):e7787.
[6] DADVAR M, TRIESCHNINGG D, de JONG F. Experts and machines against bullies:a hybrid approach to detect cyberbullies[C]//Proceedings of the 27th Canadian Conference on Artificial Intelligence, LNCS 8436. Cham:Springer, 2014:275-281.
[7] FIRUZI K, VAKILIAN M, DARABAD V P, et al. A novel method for differentiating and clustering multiple partial discharge sources using S transform and bag of words feature[J]. IEEE Transactions on Dielectrics and Electrical Insulation, 2018, 24(6):3694-3702.
[8] COLLIER N, NOBATA C, TSUJⅡ J. Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain[J]. Terminology, 2001, 7(2):239-257.
[9] DJURIC N, ZHOU J, MORRIS R, et al. Hate speech detection with comment embeddings[C]//Proceedings of the 24th International Conference on World Wide Web. New York:ACM, 2015:29-30.
[10] WIJERATNE S, DORAN D, SHETH A, et al. Analyzing the social media footprint of street gangs[C]//ISI 2015:Proceedings of the 2015 IEEE International Conference on Intelligence and Security Informatics. Piscataway, NJ:IEEE, 2015:91-96.
[11] GITARI N D, ZUPING Z, DAMIEN H, et al. A lexicon-based approach for hate speech detection[J]. International Journal of Multimedia and Ubiquitous Engineering, 2015, 10(4):215-230.
[12] MISHRA M K, KUMAR S, VAISH A, et al. Quantifying degree of cyber bullying using level of information shared and associated trust[C]//Proceedings of the 2015 Annual IEEE India Conference. Piscataway, NJ:IEEE, 2015:1-6.
[13] OGUZLAR A. With R programming, comparison of performance of different machine learning algorithms[J]. European Journal of Multidisciplinary Studies, 2018, 3(2):172-172.
[14] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2016:1480-1489.
[15] 李洋,董红斌.基于CNN和BiLSTM网络特征融合的文本情感分析[J].计算机应用,2018,38(11):3075-3080. (LI Y, DONG H B. Text sentiment analysis based on feature fusion of convolution neural network and bidirectional long short-term memory network[J]. Journal of Computer Applications, 2018, 38(11):3075-3080.)
[16] LI S, LI W, COOK C, et al. Independently recurrent neural network (IndRNN):building a longer and deeper RNN[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2018:5457-5466.
[17] 王建华,周明强,盛爱萍.现代汉语语境研究[M].杭州:浙江大学出版社,2002:59. (WANG J H, ZHOU M Q, SHENG A P. On the Context of Modern Chinese[M]. Hangzhou:Zhejiang University Press, 2002:59.)
[18] XU J M, JUN K S, ZHU X, et al. Learning from bullying traces in social media[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2012:656-666.

联合分层注意力网络和独立循环神经网络的地域欺凌识别

Regional bullying recognition based on joint hierarchical attentional network and independent recurrent neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	汪才钦, 周渝皓, 张顺香, 王琰慧, 王小龙. 基于语境增强的新能源汽车投诉文本方面-观点对抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2430-2436.
[2]	袁泉, 陈昌平, 陈泽, 詹林峰. 基于BERT的两次注意力机制远程监督关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1080-1085.
[3]	贾晴, 王来花, 王伟胜. 基于独立循环神经网络与变分自编码网络的视频帧异常检测[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 507-513.
[4]	黄诚, 赵倩锐. 基于语言模型词嵌入和注意力机制的敏感信息检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2009-2014.
[5]	富坤, 高金辉, 赵晓梦, 李佳宁. 融合全局结构信息的拓扑优化图卷积网络[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 357-364.
[6]	王伟, 赵尔平, 崔志远, 孙浩. 基于HowNet义原和Word2vec词向量表示的多特征融合消歧方法[J]. 计算机应用, 2021, 41(8): 2193-2198.
[7]	车冰倩, 周栋. 融合网络结构信息及文本内容的标签推荐方法[J]. 计算机应用, 2021, 41(4): 976-983.
[8]	温超东, 曾诚, 任俊伟, 张. 结合ALBERT和双向门控循环单元的专利文本分类[J]. 计算机应用, 2021, 41(2): 407-412.
[9]	雷皓云, 任珍文, 汪彦龙, 薛爽, 李浩然. 基于上界单纯形投影图张量学习的多核聚类算法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3468-3474.
[10]	李凯, 李洁. 基于pinball损失的结构模糊多分类支持向量机算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3104-3112.
[11]	张心怡, 冯仕民, 丁恩杰. 面向煤矿的实体识别与关系抽取模型[J]. 计算机应用, 2020, 40(8): 2182-2188.
[12]	王月, 王孟轩, 张胜, 杜渂. 基于BERT的警情文本命名实体识别[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 535-540.
[13]	武婷, 曹春萍. 融合位置权重的基于注意力交叉注意力的长短期记忆方面情感分析模型[J]. 计算机应用, 2019, 39(8): 2198-2203.
[14]	陈郑淏, 冯翱, 何嘉. 基于一维卷积混合神经网络的文本情感分类[J]. 计算机应用, 2019, 39(7): 1936-1941.
[15]	曾剑平, 陈其乐, 吴承荣, 方熙. 中文语境下的口令分析方法[J]. 计算机应用, 2019, 39(6): 1713-1718.