Offensive speech detection with irony mechanism

doi:10.11772/j.issn.1001-9081.2023040533

Abstract

Abstract:

Offensive speech on the internet seriously disrupts the normal network order and destroys the network environment for healthy communication. Existing detection technologies focus on the distinctive features in the text， and are difficult to discover more implicit attack methods. For the above problems， an offensive speech detection model BSWD （Bidirectional Encoder Representation from Transformers-based Sarcasm and Word Detection） incorporating irony mechanism was proposed. First， a model based on irony mechanism Sarcasm-BERT was proposed to detect semantic conflicts in speech. Secondly， a fine-grained word offensive feature extraction model WordsDetect was proposed to detect offensive words in speech. Finally， the model BSWD was obtained by fusing the above two models. The experimental results show that the accuracy， precision， recall， and F1 score indicators of the proposed model are generally improved by 2%， compared with the BERT（Bidirectional Encoder Representation from Transformers） and HateBERT methods. BSWD significantly improves the detection performance and can better detect implicit offensive speech. Compared with the SKS （Sentiment Knowledge Sharing） and BiCHAT （Bi-LSTM with deep CNN and Hierarchical ATtention） methods， BSWD has stronger generalization ability and robustness. The above results verify that BSWD can effectively detect the implicit offensive speech.

Key words: irony detection, offensive speech detection, fine-grained feature, implicit attack, attention mechanism

摘要：

互联网上的攻击性言论严重扰乱了正常网络秩序，破坏了健康交流的网络环境。现有的检测技术更关注文本中的鲜明特征，难以发现更隐晦的攻击方式。针对上述问题，提出融合反讽机制的攻击性言论检测模型BSWD（Bidirectional Encoder Representation from Transformers-based Sarcasm and Word Detection）。首先，提出基于反讽机制的模型Sarcasm-BERT，以检测言论中的语义冲突；其次，提出细粒度词汇攻击性特征提取模型WordsDetect，检测言论中的攻击性词汇；最后，融合两种模型得到BSWD。实验结果表明，与BERT（Bidirectional Encoder Representation from Transformers）、HateBERT模型相比，所提模型的准确率、精确率、召回率和F1分数指标大部分能提升2%，显著提高了检测性能，更能发现隐含的攻击性言论；同时，与SKS（Sentiment Knowledge Sharing）、BiCHAT（Bidirectional long short-term memory with deep Convolution neural network and Hierarchical ATtention）模型相比，具有更强的泛化能力和鲁棒性。以上结果验证了BSWD检测隐晦攻击性言论的有效性。

关键词: 反讽检测, 攻击性言论检测, 细粒度特征, 隐晦攻击, 注意力机制

CLC Number:

TP391.1

Haihan WANG, Yan ZHU. Offensive speech detection with irony mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1065-1071.

王海涵, 朱焱. 融合反讽机制的攻击性言论检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1065-1071.

Figures/Tables 15

References 28

1	LEE H-S， LEE H-R， J-U PARK， et al. An abusive text detection system based on enhanced abusive and non-abusive word lists［J］. Decision Support Systems， 2018， 113： 22-31. 10.1016/j.dss.2018.06.009
2	LEE J-H， J-U PARK， J-W CHA， et al. Detecting context abusiveness using hierarchical deep learning［C］// Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom： Censorship， Disinformation， and Propaganda. Stroudsburg： ACL， 2019： 10-19. 10.18653/v1/d19-5002
3	WANG K， LU D， HAN C， et al. Detect all abuse！ Toward universal abusive language detection models［C］// Proceedings of the 28th International Conference on Computational Linguistics. Barcelona： International Committee on Computational Linguistics， 2020： 6366-6376. 10.18653/v1/2020.coling-main.560
4	WIEGAND M， RUPPENHOFER J， KLEINBAUER T. Detection of abusive language： the problem of biased datasets［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 602-608.
5	ELSHERIEF M， ZIEMS C， MUCHLINSKI D， et al. Latent hatred： a benchmark for understanding implicit hate speech［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 345-363. 10.18653/v1/2021.emnlp-main.29
6	CASELLI T， BASILE V， MITROVIĆ J， et al. I feel offended， don’t be abusive！ Implicit/explicit messages in offensive and abusive language［C］// Proceedings of the 12th Language Resources and Evaluation Conference. Paris： Eruopean Language Resources Association， 2020： 6193-6202. 10.18653/v1/2021.woah-1.3
7	ARANGO A， PÉREZ J， POBLETE B. Hate speech detection is not as easy as you may think： a closer look at model validation［C］// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2019： 45-54. 10.1145/3331184.3331262
8	YIN W， ZUBIAGA A. Towards generalisable hate speech detection： a review on obstacles and solutions［J］. PeerJ Computer Science， 2021， 7： e598. 10.7717/peerj-cs.598
9	DEVLIN J， CHANG M-W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
10	CHAKRABARTY T， GUPTA K， MURESAN S. Pay “attention” to your context when classifying abusive language［C］// Proceedings of the Third Workshop on Abusive Language Online. Stroudsburg： ACL， 2019： 70-79. 10.18653/v1/w19-3508
11	RODRÍGUEZ-SÁNCHEZ F， CARRILLO-DE-ALBORNOZ J， PLAZA L. Automatic classification of sexism in social networks： an empirical study on twitter data［J］. IEEE Access， 2020， 8： 219563-219576. 10.1109/access.2020.3042604
12	KAPIL P， EKBAL A. A deep neural network based multi-task learning approach to hate speech detection［J］. Knowledge-Based Systems， 2020， 210： 106458. 10.1016/j.knosys.2020.106458
13	ZHOU Y， YANG Y， LIU H， et al. Deep learning based fusion approach for hate speech detection［J］. IEEE Access， 2020， 8： 128923-128929. 10.1109/access.2020.3009244
14	WULLACH T， ADLER A， MINKOV E. Towards hate speech detection at large via deep generative modeling［J］. IEEE Internet Computing， 2021， 25（2）： 48-57. 10.1109/mic.2020.3033161
15	ZHOU X， YONG Y， FAN X， et al. Hate speech detection based on sentiment knowledge sharing［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 7158-7166. 10.18653/v1/2021.acl-long.556
16	FORTUNA P， SOLER-COMPANY J， WANNER L. How well do hate speech， toxicity， abusive and offensive language classification models generalize across datasets？［J］. Information Processing & Management， 2021， 58（3）： 102524. 10.1016/j.ipm.2021.102524
17	KHAN S， KAMAL A， FAZIL M， et al. HCovBi-Caps： hate speech detection using convolutional and bi-directional gated recurrent unit with Capsule network［J］. IEEE Access， 2022， 10： 7881-7894. 10.1109/access.2022.3143799
18	KHAN S， FAZIL M， SEJWAL V K， et al. BiCHAT： BiLSTM with deep CNN and hierarchical attention for hate speech detection［J］. Journal of King Saud University — Computer and Information Sciences， 2022， 34（7）： 4335-4344. 10.1016/j.jksuci.2022.05.006
19	GROLMAN E， BINYAMINI H， SHABTAI A， et al. HateVersarial： adversarial attack against hate speech detection algorithms on twitter［C］// Proceedings of the 30th ACM Conference on User Modeling， Adaption and Personalization. New York： ACM， 2022： 143-152. 10.1145/3503252.3531309
20	LI J， NING Y. Anti-asian hate speech detection via data augmented semantic relation inference［C］// Proceedings of the Sixteenth International AAAI Conference on Web and Social Media. Palo Alto： AAAI Press， 2022： 607-617. 10.1609/icwsm.v16i1.19319
21	KIM Y， PARK S， HAN Y-S. Generalizable implicit hate speech detection using contrastive learning［C］// Proceedings of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 6667-6679. 10.18653/v1/2023.findings-emnlp.731
22	WIEGAND M， RUPPENHOFER J， SCHMIDT A， et al. Inducing a lexicon of abusive words — a feature-based approach［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 1046-1056. 10.18653/v1/n18-1095
23	ZAMPIERI M， MALMASI S， NAKOV P， et al. Predicting the type and target of offensive posts in social media［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 1415-1420. 10.18653/v1/n19-1144
24	DAVIDSON T， WARMSLEY D， MACY M， et al. Automated hate speech detection and the problem of offensive language［C］// Proceedings of the Eleventh International AAAI Conference on Web and Social Media. Palo Alto： AAAI Press， 2017： 512-515. 10.1609/icwsm.v11i1.14955
25	MATHEW B， SAHA P， YIMAM S M， et al. HateXplain： a benchmark dataset for explainable hate speech detection［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 14867-14875. 10.1609/aaai.v35i17.17745
26	YANG Z， YANG D， DYER C， et al. Hierarchical attention networks for document classification［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2016： 1480-1489. 10.18653/v1/n16-1174
27	XIONG T， ZHANG P， ZHU H， et al. Sarcasm detection with self-matching networks and low-rank bilinear pooling［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 2115-2124. 10.1145/3308558.3313735
28	CASELLI T， BASILE V， MITROVIĆ J， et al. HateBERT： retraining BERT for abusive language detection in English［C］// Proceedings of the 5th Workshop on Online Abuse and Harms Stroudsburg： ACL， 2021： 17-25. 10.18653/v1/2021.woah-1.3

数据集	训练集样本数	测试集样本数	总样本数
OLID	13 240	860	14 100
HateBase	19 826	4 957	24 783
HateXplain	16 118	4 030	20 148
Implicit Hate Corpus	17 184	4 296	21 480
HatEval-2019	9 000	1 000	10 000

数据集	训练集样本数	测试集样本数	总样本数
OLID	13 240	860	14 100
HateBase	19 826	4 957	24 783
HateXplain	16 118	4 030	20 148
Implicit Hate Corpus	17 184	4 296	21 480
HatEval-2019	9 000	1 000	10 000

模型	ACC	P	R	Macro-F1
BiLSTM	82.09	78.12	75.47	76.48
BiCHAT	83.49	81.61	75.14	77.39
HN-ATT	82.79	79.32	76.19	78.69
SKS	83.72	77.22	80.68	78.62
SMSD	83.72	80.90	77.13	78.69
BERT	84.67	83.67	79.02	80.83
HateBERT	85.50	83.08	79.45	80.94
BSWD	87.35	85.48	81.79	83.48

模型	ACC	P	R	Macro-F1
BiLSTM	82.09	78.12	75.47	76.48
BiCHAT	83.49	81.61	75.14	77.39
HN-ATT	82.79	79.32	76.19	78.69
SKS	83.72	77.22	80.68	78.62
SMSD	83.72	80.90	77.13	78.69
BERT	84.67	83.67	79.02	80.83
HateBERT	85.50	83.08	79.45	80.94
BSWD	87.35	85.48	81.79	83.48

模型	ACC	P	R	Macro-F1
BiLSTM	94.14	89.69	89.77	89.73
BiCHAT	94.90	90.22	92.26	91.19
HN-ATT	95.25	91.07	92.48	91.75
SKS	94.86	94.45	89.14	91.49
SMSD	95.27	90.37	93.37	91.77
BERT	96.71	93.22	93.26	93.24
HateBERT	95.99	92.63	93.00	92.82
BSWD	97.12	94.83	94.68	94.75