Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 770-777.DOI: 10.11772/j.issn.1001-9081.2021040791
Special Issue: 人工智能; 2021年中国计算机学会人工智能会议(CCFAI 2021)
• 2021 CCF Conference on Artificial Intelligence (CCFAI 2021) • Previous Articles Next Articles
Jian ZHANG, Ke YAN(), Xiang MA
Received:
2021-05-17
Revised:
2021-06-04
Accepted:
2021-06-09
Online:
2021-11-09
Published:
2022-03-10
Contact:
Ke YAN
About author:
ZHANG Jian, born in 1997, M. S. candidate. His research interests include text classification, recognition of multi-task learning.Supported by:
通讯作者:
严珂
作者简介:
张建(1997—),男,江西高安人,硕士研究生,主要研究方向:文本分类、多任务学习情感识别基金资助:
CLC Number:
Jian ZHANG, Ke YAN, Xiang MA. Analysis of complex spam filtering algorithm based on neural network[J]. Journal of Computer Applications, 2022, 42(3): 770-777.
张建, 严珂, 马祥. 基于神经网络的复杂垃圾信息过滤算法分析[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 770-777.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021040791
数据集 | 文本 |
---|---|
垃圾短信(SMS Spam) | CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN |
Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min) | |
(Bank of Granite issues Strong-Buy) EXPLOSIVE PICK FOR OUR MEMBERS *****UP OVER 300% *********** Nasdaq Symbol CDGT That is a $5.00 per.. | |
垃圾广告(Ads Spam) | facial lines along with loose skin color could be enhanced by a single skin care product. Elliskin The idea is included with Supplements C as well as some various other needed nutritional requirements along with healthy antioxidants distinguished for cor |
Albuminoidal is what ultimately conceals the age spots and collectively the discoloration of your skin. It additionally aids in adjustment the skin so on deflate wrinkles. On exploitation of times many of its users have according that they give the impre | |
垃圾邮件(Email Spam) | Norton AD ATTENTION: This is a MUST for ALL Computer Users!!! *NEW - Special Package Deal!* …… |
Tab. 2 Some spam difficult to identify
数据集 | 文本 |
---|---|
垃圾短信(SMS Spam) | CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN |
Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min) | |
(Bank of Granite issues Strong-Buy) EXPLOSIVE PICK FOR OUR MEMBERS *****UP OVER 300% *********** Nasdaq Symbol CDGT That is a $5.00 per.. | |
垃圾广告(Ads Spam) | facial lines along with loose skin color could be enhanced by a single skin care product. Elliskin The idea is included with Supplements C as well as some various other needed nutritional requirements along with healthy antioxidants distinguished for cor |
Albuminoidal is what ultimately conceals the age spots and collectively the discoloration of your skin. It additionally aids in adjustment the skin so on deflate wrinkles. On exploitation of times many of its users have according that they give the impre | |
垃圾邮件(Email Spam) | Norton AD ATTENTION: This is a MUST for ALL Computer Users!!! *NEW - Special Package Deal!* …… |
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score | |||
传统 方法 | NB | Spam | 0.825 | 0.867 | 0.846 | 0.964 | 0.900 | 0.931 | 0.851 | 0.950 | 0.898 |
Ham | 0.860 | 0.817 | 0.838 | 0.906 | 0.967 | 0.935 | 0.943 | 0.833 | 0.885 | ||
RF | Spam | 0.918 | 0.750 | 0.826 | 0.938 | 1.000 | 0.968 | 0.866 | 0.967 | 0.913 | |
Ham | 0.789 | 0.933 | 0.855 | 1.000 | 0.933 | 0.966 | 0.962 | 0.850 | 0.903 | ||
SVM | Spam | 0.942 | 0.817 | 0.875 | 0.935 | 0.967 | 0.951 | 0.965 | 0.917 | 0.940 | |
Ham | 0.838 | 0.950 | 0.891 | 0.966 | 0.933 | 0.949 | 0.921 | 0.967 | 0.943 | ||
LR | Spam | 0.940 | 0.783 | 0.855 | 0.951 | 0.967 | 0.959 | 0.921 | 0.967 | 0.943 | |
Ham | 0.814 | 0.950 | 0.877 | 0.966 | 0.950 | 0.958 | 0.965 | 0.917 | 0.940 | ||
DT | Spam | 0.843 | 0.717 | 0.775 | 0.966 | 0.933 | 0.949 | 0.786 | 0.917 | 0.846 | |
Ham | 0.754 | 0.867 | 0.806 | 0.935 | 0.967 | 0.951 | 0.900 | 0.750 | 0.818 | ||
当前 主流 方法 | DPCNN | Spam | 0.965 | 0.917 | 0.940 | 0.952 | 1.000 | 0.976 | 0.930 | 0.993 | 0.906 |
Ham | 0.921 | 0.967 | 0.943 | 1.000 | 0.950 | 0.974 | 0.889 | 0.933 | 0.911 | ||
BERT | Spam | 0.931 | 0.900 | 0.915 | 0.967 | 0.983 | 0.975 | 0.944 | 0.850 | 0.895 | |
Ham | 0.903 | 0.933 | 0.918 | 0.983 | 0.967 | 0.975 | 0.864 | 0.950 | 0.905 | ||
TinyBERT | Spam | 0.903 | 0.933 | 0.918 | 0.967 | 0.967 | 0.967 | 0.906 | 0.800 | 0.850 | |
Ham | 0.931 | 0.900 | 0.915 | 0.967 | 0.967 | 0.967 | 0.821 | 0.917 | 0.866 | ||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.967 | 0.967 | 0.984 | 1.000 | 0.992 | 0.967 | 0.983 | 0.975 |
Ham | 0.967 | 0.967 | 0.967 | 1.000 | 0.983 | 0.992 | 0.983 | 0.967 | 0.975 | ||
TextRNN | Spam | 0.983 | 0.983 | 0.983 | 0.967 | 0.983 | 0.975 | — | — | — | |
Ham | 0.983 | 0.983 | 0.983 | 0.983 | 0.967 | 0.975 | — | — | — | ||
TextRCNN | Spam | 0.952 | 0.983 | 0.967 | 0.968 | 1.000 | 0.984 | 0.968 | 1.000 | 0.984 | |
Ham | 0.983 | 0.950 | 0.966 | 1.000 | 0.967 | 0.983 | 1.000 | 0.967 | 0.983 |
Tab. 3 Classification results of traditional methods, current methods, and neural network methods on three datasets
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score | |||
传统 方法 | NB | Spam | 0.825 | 0.867 | 0.846 | 0.964 | 0.900 | 0.931 | 0.851 | 0.950 | 0.898 |
Ham | 0.860 | 0.817 | 0.838 | 0.906 | 0.967 | 0.935 | 0.943 | 0.833 | 0.885 | ||
RF | Spam | 0.918 | 0.750 | 0.826 | 0.938 | 1.000 | 0.968 | 0.866 | 0.967 | 0.913 | |
Ham | 0.789 | 0.933 | 0.855 | 1.000 | 0.933 | 0.966 | 0.962 | 0.850 | 0.903 | ||
SVM | Spam | 0.942 | 0.817 | 0.875 | 0.935 | 0.967 | 0.951 | 0.965 | 0.917 | 0.940 | |
Ham | 0.838 | 0.950 | 0.891 | 0.966 | 0.933 | 0.949 | 0.921 | 0.967 | 0.943 | ||
LR | Spam | 0.940 | 0.783 | 0.855 | 0.951 | 0.967 | 0.959 | 0.921 | 0.967 | 0.943 | |
Ham | 0.814 | 0.950 | 0.877 | 0.966 | 0.950 | 0.958 | 0.965 | 0.917 | 0.940 | ||
DT | Spam | 0.843 | 0.717 | 0.775 | 0.966 | 0.933 | 0.949 | 0.786 | 0.917 | 0.846 | |
Ham | 0.754 | 0.867 | 0.806 | 0.935 | 0.967 | 0.951 | 0.900 | 0.750 | 0.818 | ||
当前 主流 方法 | DPCNN | Spam | 0.965 | 0.917 | 0.940 | 0.952 | 1.000 | 0.976 | 0.930 | 0.993 | 0.906 |
Ham | 0.921 | 0.967 | 0.943 | 1.000 | 0.950 | 0.974 | 0.889 | 0.933 | 0.911 | ||
BERT | Spam | 0.931 | 0.900 | 0.915 | 0.967 | 0.983 | 0.975 | 0.944 | 0.850 | 0.895 | |
Ham | 0.903 | 0.933 | 0.918 | 0.983 | 0.967 | 0.975 | 0.864 | 0.950 | 0.905 | ||
TinyBERT | Spam | 0.903 | 0.933 | 0.918 | 0.967 | 0.967 | 0.967 | 0.906 | 0.800 | 0.850 | |
Ham | 0.931 | 0.900 | 0.915 | 0.967 | 0.967 | 0.967 | 0.821 | 0.917 | 0.866 | ||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.967 | 0.967 | 0.984 | 1.000 | 0.992 | 0.967 | 0.983 | 0.975 |
Ham | 0.967 | 0.967 | 0.967 | 1.000 | 0.983 | 0.992 | 0.983 | 0.967 | 0.975 | ||
TextRNN | Spam | 0.983 | 0.983 | 0.983 | 0.967 | 0.983 | 0.975 | — | — | — | |
Ham | 0.983 | 0.983 | 0.983 | 0.983 | 0.967 | 0.975 | — | — | — | ||
TextRCNN | Spam | 0.952 | 0.983 | 0.967 | 0.968 | 1.000 | 0.984 | 0.968 | 1.000 | 0.984 | |
Ham | 0.983 | 0.950 | 0.966 | 1.000 | 0.967 | 0.983 | 1.000 | 0.967 | 0.983 |
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Spam | Ham | AUC | Spam | Ham | AUC | Spam | Ham | AUC | |||
传统 方法 | NB | Spam | 0.867 | 0.133 | 0.842 | 0.900 | 0.100 | 0.933 | 0.950 | 0.050 | 0.892 |
Ham | 0.183 | 0.817 | 0.033 | 0.967 | 0.167 | 0.833 | |||||
RF | Spam | 0.750 | 0.250 | 0.842 | 1.000 | 0.000 | 0.967 | 0.967 | 0.033 | 0.908 | |
Ham | 0.067 | 0.933 | 0.067 | 0.933 | 0.150 | 0.850 | |||||
SVM | Spam | 0.817 | 0.183 | 0.883 | 0.967 | 0.033 | 0.950 | 0.917 | 0.083 | 0.942 | |
Ham | 0.050 | 0.950 | 0.067 | 0.933 | 0.033 | 0.967 | |||||
LR | Spam | 0.783 | 0.217 | 0.867 | 0.967 | 0.033 | 0.958 | 0.967 | 0.033 | 0.942 | |
Ham | 0.050 | 0.950 | 0.050 | 0.950 | 0.083 | 0.917 | |||||
DT | Spam | 0.717 | 0.283 | 0.792 | 0.933 | 0.067 | 0.950 | 0.917 | 0.083 | 0.833 | |
Ham | 0.133 | 0.867 | 0.033 | 0.967 | 0.250 | 0.750 | |||||
当前 主流 方法 | DPCNN | Spam | 0.917 | 0.033 | 0.942 | 1.000 | 0.050 | 0.975 | 0.883 | 0.067 | 0.908 |
Ham | 0.083 | 0.967 | 0.000 | 0.950 | 0.117 | 0.933 | |||||
BERT | Spam | 0.900 | 0.067 | 0.917 | 0.983 | 0.033 | 0.975 | 0.850 | 0.050 | 0.900 | |
Ham | 0.100 | 0.933 | 0.017 | 0.967 | 0.150 | 0.950 | |||||
TinyBERT | Spam | 0.933 | 0.100 | 0.917 | 0.967 | 0.033 | 0.967 | 0.800 | 0.083 | 0.858 | |
Ham | 0.067 | 0.900 | 0.033 | 0.967 | 0.200 | 0.917 | |||||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.033 | 0.967 | 1.000 | 0.000 | 0.992 | 0.983 | 0.017 | 0.975 |
Ham | 0.033 | 0.967 | 0.017 | 0.983 | 0.033 | 0.967 | |||||
TextRNN | Spam | 0.983 | 0.017 | 0.983 | 0.983 | 0.017 | 0.975 | — | — | — | |
Ham | 0.017 | 0.983 | 0.033 | 0.967 | — | — | |||||
TextRCNN | Spam | 0.983 | 0.017 | 0.967 | 1.000 | 0.000 | 0.983 | 1.000 | 0.000 | 0.983 | |
Ham | 0.050 | 0.950 | 0.033 | 0.967 | 0.033 | 0.967 |
Tab. 4 Confusion matrixes and AUC values of traditional methods, current methods, and neural network methods on three datasets
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Spam | Ham | AUC | Spam | Ham | AUC | Spam | Ham | AUC | |||
传统 方法 | NB | Spam | 0.867 | 0.133 | 0.842 | 0.900 | 0.100 | 0.933 | 0.950 | 0.050 | 0.892 |
Ham | 0.183 | 0.817 | 0.033 | 0.967 | 0.167 | 0.833 | |||||
RF | Spam | 0.750 | 0.250 | 0.842 | 1.000 | 0.000 | 0.967 | 0.967 | 0.033 | 0.908 | |
Ham | 0.067 | 0.933 | 0.067 | 0.933 | 0.150 | 0.850 | |||||
SVM | Spam | 0.817 | 0.183 | 0.883 | 0.967 | 0.033 | 0.950 | 0.917 | 0.083 | 0.942 | |
Ham | 0.050 | 0.950 | 0.067 | 0.933 | 0.033 | 0.967 | |||||
LR | Spam | 0.783 | 0.217 | 0.867 | 0.967 | 0.033 | 0.958 | 0.967 | 0.033 | 0.942 | |
Ham | 0.050 | 0.950 | 0.050 | 0.950 | 0.083 | 0.917 | |||||
DT | Spam | 0.717 | 0.283 | 0.792 | 0.933 | 0.067 | 0.950 | 0.917 | 0.083 | 0.833 | |
Ham | 0.133 | 0.867 | 0.033 | 0.967 | 0.250 | 0.750 | |||||
当前 主流 方法 | DPCNN | Spam | 0.917 | 0.033 | 0.942 | 1.000 | 0.050 | 0.975 | 0.883 | 0.067 | 0.908 |
Ham | 0.083 | 0.967 | 0.000 | 0.950 | 0.117 | 0.933 | |||||
BERT | Spam | 0.900 | 0.067 | 0.917 | 0.983 | 0.033 | 0.975 | 0.850 | 0.050 | 0.900 | |
Ham | 0.100 | 0.933 | 0.017 | 0.967 | 0.150 | 0.950 | |||||
TinyBERT | Spam | 0.933 | 0.100 | 0.917 | 0.967 | 0.033 | 0.967 | 0.800 | 0.083 | 0.858 | |
Ham | 0.067 | 0.900 | 0.033 | 0.967 | 0.200 | 0.917 | |||||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.033 | 0.967 | 1.000 | 0.000 | 0.992 | 0.983 | 0.017 | 0.975 |
Ham | 0.033 | 0.967 | 0.017 | 0.983 | 0.033 | 0.967 | |||||
TextRNN | Spam | 0.983 | 0.017 | 0.983 | 0.983 | 0.017 | 0.975 | — | — | — | |
Ham | 0.017 | 0.983 | 0.033 | 0.967 | — | — | |||||
TextRCNN | Spam | 0.983 | 0.017 | 0.967 | 1.000 | 0.000 | 0.983 | 1.000 | 0.000 | 0.983 | |
Ham | 0.050 | 0.950 | 0.033 | 0.967 | 0.033 | 0.967 |
方法 | 分类器 | 运行时间 | ||
---|---|---|---|---|
SMS数据集 | Ads数据集 | Email数据集 | ||
当前 主流 方法 | DPCNN | 3.4 | 37.1 | 1 098.1 |
BERT | 61.6 | 358.1 | 320.0 | |
Tiny-BERT | 17.8 | 77.7 | 78.1 | |
神经 网络 方法 | TextCNN | 2.2 | 6.0 | 138.4 |
TextRNN | 1.9 | 11.3 | — | |
TextRCNN | 1.9 | 12.7 | 869.0 |
Tab. 5 Running times of traditional methods, current methods and neural network methods on three datasets
方法 | 分类器 | 运行时间 | ||
---|---|---|---|---|
SMS数据集 | Ads数据集 | Email数据集 | ||
当前 主流 方法 | DPCNN | 3.4 | 37.1 | 1 098.1 |
BERT | 61.6 | 358.1 | 320.0 | |
Tiny-BERT | 17.8 | 77.7 | 78.1 | |
神经 网络 方法 | TextCNN | 2.2 | 6.0 | 138.4 |
TextRNN | 1.9 | 11.3 | — | |
TextRCNN | 1.9 | 12.7 | 869.0 |
1 | EL-ALFY E-S M, ALHASAN A A. Spam filtering framework for multimodal mobile communication based on dendritic cell algorithm [J]. Future Generation Computer Systems, 2016, 64: 98-107. 10.1016/j.future.2016.02.018 |
2 | FERNANDES D, COSTA K A P D, ALMEIDA T A, et al. SMS spam filtering through optimum-path forest-based classifiers[C]// Proceedings of the 2015 International Conference on Machine Learning and Applications. Piscataway: IEEE, 2015: 133-137. 10.1109/icmla.2015.71 |
3 | RAZAK M F AB, ANUAR N B, SALLEH R, et al. The rise of “malware”: Bibliometric analysis of malware study[J]. Journal of Network and Computer Applications, 2016, 75: 58-76. 10.1016/j.jnca.2016.08.022 |
4 | ALMEIDA T, HIDALGO J M G, SILVA T P. Towards SMS spam filtering: results under a new dataset[J]. International Journal of Information Security Science, 2013, 2(1): 1-18. |
5 | JUNAID M B, FAROOQ M. Using evolutionary learning classifiers to do MobileSpam (SMS) filtering [C]// Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. New York: ACM, 2011: 1795-1802. 10.1145/2001576.2001817 |
6 | SILVA R M, ALMEIDA T A, YAMAKAMI A. MDLText: an efficient and lightweight text classifier [J]. Knowledge-Based Systems, 2017, 118: 152-164. 10.1016/j.knosys.2016.11.018 |
7 | ADEWOLE K S, ANUAR N B, KAMSIN A, et al. SMSAD: a framework for spam message and spam account detection [J]. Kluwer Academic Publishers, 2019, 78(4): 3925-3960. 10.1007/s11042-017-5018-x |
8 | BOUJNOUNI M E. SMS spam filtering using N-gram method, information gain metric and an improved version of SVDD classifier [J]. Journal of Engineering Science and Technology Review, 2017, 10(1): 131-137. 10.25103/jestr.101.18 |
9 | RUANO-ORDÁS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. RuleSIM: a toolkit for simulating the operation and improving throughput of rule-based spam filters [J]. Software Practice & Experience, 2016,46(8): 1091-1108. 10.1002/spe.2342 |
10 | 郝苗苗, 徐秀娟, 于红, 等. 基于中文微博的情绪分类与预测算法 [J]. 计算机应用, 2018, 38(S2): 89-96. |
HAO M M, XU X J, YU H, et al. Emotion classification and prediction algorithm based on Chinese microblog [J]. Journal of Computer Applications, 2018, 38(S2): 89-96. | |
11 | 焦庆争, 蔚承建. 分布权值调节概率标准差的文本分类方法[J]. 计算机应用, 2009, 29(12): 3303-3306. 10.3724/sp.j.1087.2009.03303 |
JIAO Q Z, WEI C J. Text categorization approach based on probability standard deviation with evaluation of distribution information [J]. Journal of Computer Applications, 2009, 29(12): 3303-3306. 10.3724/sp.j.1087.2009.03303 | |
12 | RUANO-ORDAS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks [J]. Journal of Systems & Software, 2013, 86(12): 3151-3161. 10.1016/j.jss.2013.07.036 |
13 | RUANO-ORDÁS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems [J]. Software Practice & Experience, 2016, 46(8): 1035-1051. 10.1002/spe.2343 |
14 | WU C-H. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks [J]. Expert Systems with Applications, 2009, 36(3): 4321-4330. 10.1016/j.eswa.2008.03.002 |
15 | LUO Q, LIU B, YAN J, et al. Design and implement a rule-based spam filtering system using neural network [C]// Proceedings of the 2011 International Conference on Computational and Information Sciences. Piscataway: IEEE, 2011: 398-401. 10.1109/iccis.2011.125 |
16 | CUTLER A, CUTLER D R, STEVENS J R. Random forests [J]. Machine Learning, 2011, 45: 157-176. 10.1007/978-1-4419-9326-7_5 |
17 | CORTES C, VAPNIK V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273-297. 10.1007/bf00994018 |
18 | 高秀梅, 陈芳, 宋枫溪, 等. 特征权对贝叶斯分类器文本分类性能的影响[J]. 计算机应用, 2008, 28(12): 3080-3083. 10.3724/sp.j.1087.2008.03080 |
GAO X M, CHEN F, SONG F X, et al. Influence of feature weight on text categorization performance of Bayesian classifier [J]. Journal of Computer Applications,2008,28(12):3080-3083. 10.3724/sp.j.1087.2008.03080 | |
19 | 董才正, 刘柏嵩. 面向问答社区的中文问题分类[J]. 计算机应用, 2016, 36(4): 1060-1065. 10.11772/j.issn.1001-9081.2016.04.1060 |
DONG C Z, LIU B S. Community question answering-oriented Chinese question classification [J]. Journal of Computer Applications, 2016, 36(4): 1060-1065. 10.11772/j.issn.1001-9081.2016.04.1060 | |
20 | MARTINEAU J, FININ T. Delta TFIDF: an improved feature space for sentiment analysis [C]// Proceedings of the 2009 International Conference on Weblogs and Social Media. Palo Alto, CA: AAAI, 2009:258-261. 10.1109/cse.2009.584 |
21 | KIM Y. Convolutional neural networks for sentence classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. 10.3115/v1/d14-1181 |
22 | LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2016: 2873-2879. |
23 | LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification [C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 2267-2273. 10.1609/aaai.v33i01.33017370 |
24 | LUONG T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 1412-1421. 10.18653/v1/d15-1166 |
25 | YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1480-1489. 10.18653/v1/n16-1174 |
26 | JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 562-570. 10.18653/v1/p17-1052 |
27 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL].[2020-06-22] . 10.3126/jiee.v3i1.34327 |
28 | PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543. 10.3115/v1/d14-1162 |
29 | PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. 10.18653/v1/n18-1202 |
30 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. 10.18653/v1/n19-1423 |
31 | JIAO X, YIN Y, SHANG L, et al. TinyBERT: Distilling BERT for natural language understanding[C]// Proceedings of the 2020 Findings of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4163-4174. 10.18653/v1/2020.findings-emnlp.372 |
32 | DUA D, GRAFF C. UCI machine learning repository [D]. Irvine, California: University of California, 2017. |
33 | PISHARADY P K, VADAKKEPAT P, POH L A. Hand posture and face recognition using fuzzy-rough approach[C]// Computational Intelligence in Multi-Feature Visual Pattern Recognition. Singapore: Springer, 2014: 63-80. 10.1007/978-981-287-056-8_5 |
34 | BOUJON C. Playing with text classified ads [EB/OL]. [2020-07-24]. . |
35 | KROONBERG M VAN DEN, VEST B.Index of /old/publiccorpus [EB/OL]. [2020-07-21]. . |
36 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. 10.1162/neco.1997.9.8.1735 |
37 | TAGG C. A corpus linguistics study of SMS text messaging [D]. Birmingham: University of Birmingham, 2009. |
38 | 杨国峰, 杨勇. 基于BERT的常见作物病害问答系统问句分类[J]. 计算机应用, 2020, 40(6): 1580-1586. 10.1109/wacv45572.2020.9093596 |
YANG G F, YANG Y. Question classification of common crop disease question answering system based on BERT [J]. Journal of Computer Applications, 2020, 40(6): 1580-1586. 10.1109/wacv45572.2020.9093596 | |
39 | YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding [C]// Proceedings of the 2019 Advances in Neural Information Processing Systems. New York: Curran Associates, 2019, 32: 5754-5764. 10.1016/j.ymssp.2019.106289 |
40 | NURUZZAMAN M T, LEE C, CHOI D. Independent and personal SMS spam filtering [C]// Proceedings of the 2011 International Conference on Computer and Information Technology. Piscataway: IEEE, 2011: 429-435. 10.1109/cit.2011.23 |
[1] | Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration [J]. Journal of Computer Applications, 2024, 44(9): 2802-2809. |
[2] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[3] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[4] | Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982. |
[5] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[6] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[7] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[8] | Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837. |
[9] | Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885. |
[10] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[11] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
[12] | Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429. |
[13] | Zheyuan SHEN, Keke YANG, Jing LI. Personalized federated learning method based on dual stream neural network [J]. Journal of Computer Applications, 2024, 44(8): 2319-2325. |
[14] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[15] | Ying YANG, Xiaoyan HAO, Dan YU, Yao MA, Yongle CHEN. Graph data generation approach for graph neural network model extraction attacks [J]. Journal of Computer Applications, 2024, 44(8): 2483-2492. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||