Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 770-777.DOI: 10.11772/j.issn.1001-9081.2021040791
• 2021 CCF Conference on Artificial Intelligence (CCFAI 2021) • Previous Articles
Jian ZHANG, Ke YAN(), Xiang MA
Received:
2021-05-17
Revised:
2021-06-04
Accepted:
2021-06-09
Online:
2021-11-09
Published:
2022-03-10
Contact:
Ke YAN
About author:
ZHANG Jian, born in 1997, M. S. candidate. His research interests include text classification, recognition of multi-task learning.Supported by:
通讯作者:
严珂
作者简介:
张建(1997—),男,江西高安人,硕士研究生,主要研究方向:文本分类、多任务学习情感识别基金资助:
CLC Number:
Jian ZHANG, Ke YAN, Xiang MA. Analysis of complex spam filtering algorithm based on neural network[J]. Journal of Computer Applications, 2022, 42(3): 770-777.
张建, 严珂, 马祥. 基于神经网络的复杂垃圾信息过滤算法分析[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 770-777.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021040791
数据集 | 文本 |
---|---|
垃圾短信(SMS Spam) | CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN |
Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min) | |
(Bank of Granite issues Strong-Buy) EXPLOSIVE PICK FOR OUR MEMBERS *****UP OVER 300% *********** Nasdaq Symbol CDGT That is a $5.00 per.. | |
垃圾广告(Ads Spam) | facial lines along with loose skin color could be enhanced by a single skin care product. Elliskin The idea is included with Supplements C as well as some various other needed nutritional requirements along with healthy antioxidants distinguished for cor |
Albuminoidal is what ultimately conceals the age spots and collectively the discoloration of your skin. It additionally aids in adjustment the skin so on deflate wrinkles. On exploitation of times many of its users have according that they give the impre | |
垃圾邮件(Email Spam) | Norton AD ATTENTION: This is a MUST for ALL Computer Users!!! *NEW - Special Package Deal!* …… |
Tab. 2 Some spam difficult to identify
数据集 | 文本 |
---|---|
垃圾短信(SMS Spam) | CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN |
Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min) | |
(Bank of Granite issues Strong-Buy) EXPLOSIVE PICK FOR OUR MEMBERS *****UP OVER 300% *********** Nasdaq Symbol CDGT That is a $5.00 per.. | |
垃圾广告(Ads Spam) | facial lines along with loose skin color could be enhanced by a single skin care product. Elliskin The idea is included with Supplements C as well as some various other needed nutritional requirements along with healthy antioxidants distinguished for cor |
Albuminoidal is what ultimately conceals the age spots and collectively the discoloration of your skin. It additionally aids in adjustment the skin so on deflate wrinkles. On exploitation of times many of its users have according that they give the impre | |
垃圾邮件(Email Spam) | Norton AD ATTENTION: This is a MUST for ALL Computer Users!!! *NEW - Special Package Deal!* …… |
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score | |||
传统 方法 | NB | Spam | 0.825 | 0.867 | 0.846 | 0.964 | 0.900 | 0.931 | 0.851 | 0.950 | 0.898 |
Ham | 0.860 | 0.817 | 0.838 | 0.906 | 0.967 | 0.935 | 0.943 | 0.833 | 0.885 | ||
RF | Spam | 0.918 | 0.750 | 0.826 | 0.938 | 1.000 | 0.968 | 0.866 | 0.967 | 0.913 | |
Ham | 0.789 | 0.933 | 0.855 | 1.000 | 0.933 | 0.966 | 0.962 | 0.850 | 0.903 | ||
SVM | Spam | 0.942 | 0.817 | 0.875 | 0.935 | 0.967 | 0.951 | 0.965 | 0.917 | 0.940 | |
Ham | 0.838 | 0.950 | 0.891 | 0.966 | 0.933 | 0.949 | 0.921 | 0.967 | 0.943 | ||
LR | Spam | 0.940 | 0.783 | 0.855 | 0.951 | 0.967 | 0.959 | 0.921 | 0.967 | 0.943 | |
Ham | 0.814 | 0.950 | 0.877 | 0.966 | 0.950 | 0.958 | 0.965 | 0.917 | 0.940 | ||
DT | Spam | 0.843 | 0.717 | 0.775 | 0.966 | 0.933 | 0.949 | 0.786 | 0.917 | 0.846 | |
Ham | 0.754 | 0.867 | 0.806 | 0.935 | 0.967 | 0.951 | 0.900 | 0.750 | 0.818 | ||
当前 主流 方法 | DPCNN | Spam | 0.965 | 0.917 | 0.940 | 0.952 | 1.000 | 0.976 | 0.930 | 0.993 | 0.906 |
Ham | 0.921 | 0.967 | 0.943 | 1.000 | 0.950 | 0.974 | 0.889 | 0.933 | 0.911 | ||
BERT | Spam | 0.931 | 0.900 | 0.915 | 0.967 | 0.983 | 0.975 | 0.944 | 0.850 | 0.895 | |
Ham | 0.903 | 0.933 | 0.918 | 0.983 | 0.967 | 0.975 | 0.864 | 0.950 | 0.905 | ||
TinyBERT | Spam | 0.903 | 0.933 | 0.918 | 0.967 | 0.967 | 0.967 | 0.906 | 0.800 | 0.850 | |
Ham | 0.931 | 0.900 | 0.915 | 0.967 | 0.967 | 0.967 | 0.821 | 0.917 | 0.866 | ||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.967 | 0.967 | 0.984 | 1.000 | 0.992 | 0.967 | 0.983 | 0.975 |
Ham | 0.967 | 0.967 | 0.967 | 1.000 | 0.983 | 0.992 | 0.983 | 0.967 | 0.975 | ||
TextRNN | Spam | 0.983 | 0.983 | 0.983 | 0.967 | 0.983 | 0.975 | — | — | — | |
Ham | 0.983 | 0.983 | 0.983 | 0.983 | 0.967 | 0.975 | — | — | — | ||
TextRCNN | Spam | 0.952 | 0.983 | 0.967 | 0.968 | 1.000 | 0.984 | 0.968 | 1.000 | 0.984 | |
Ham | 0.983 | 0.950 | 0.966 | 1.000 | 0.967 | 0.983 | 1.000 | 0.967 | 0.983 |
Tab. 3 Classification results of traditional methods, current methods, and neural network methods on three datasets
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score | |||
传统 方法 | NB | Spam | 0.825 | 0.867 | 0.846 | 0.964 | 0.900 | 0.931 | 0.851 | 0.950 | 0.898 |
Ham | 0.860 | 0.817 | 0.838 | 0.906 | 0.967 | 0.935 | 0.943 | 0.833 | 0.885 | ||
RF | Spam | 0.918 | 0.750 | 0.826 | 0.938 | 1.000 | 0.968 | 0.866 | 0.967 | 0.913 | |
Ham | 0.789 | 0.933 | 0.855 | 1.000 | 0.933 | 0.966 | 0.962 | 0.850 | 0.903 | ||
SVM | Spam | 0.942 | 0.817 | 0.875 | 0.935 | 0.967 | 0.951 | 0.965 | 0.917 | 0.940 | |
Ham | 0.838 | 0.950 | 0.891 | 0.966 | 0.933 | 0.949 | 0.921 | 0.967 | 0.943 | ||
LR | Spam | 0.940 | 0.783 | 0.855 | 0.951 | 0.967 | 0.959 | 0.921 | 0.967 | 0.943 | |
Ham | 0.814 | 0.950 | 0.877 | 0.966 | 0.950 | 0.958 | 0.965 | 0.917 | 0.940 | ||
DT | Spam | 0.843 | 0.717 | 0.775 | 0.966 | 0.933 | 0.949 | 0.786 | 0.917 | 0.846 | |
Ham | 0.754 | 0.867 | 0.806 | 0.935 | 0.967 | 0.951 | 0.900 | 0.750 | 0.818 | ||
当前 主流 方法 | DPCNN | Spam | 0.965 | 0.917 | 0.940 | 0.952 | 1.000 | 0.976 | 0.930 | 0.993 | 0.906 |
Ham | 0.921 | 0.967 | 0.943 | 1.000 | 0.950 | 0.974 | 0.889 | 0.933 | 0.911 | ||
BERT | Spam | 0.931 | 0.900 | 0.915 | 0.967 | 0.983 | 0.975 | 0.944 | 0.850 | 0.895 | |
Ham | 0.903 | 0.933 | 0.918 | 0.983 | 0.967 | 0.975 | 0.864 | 0.950 | 0.905 | ||
TinyBERT | Spam | 0.903 | 0.933 | 0.918 | 0.967 | 0.967 | 0.967 | 0.906 | 0.800 | 0.850 | |
Ham | 0.931 | 0.900 | 0.915 | 0.967 | 0.967 | 0.967 | 0.821 | 0.917 | 0.866 | ||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.967 | 0.967 | 0.984 | 1.000 | 0.992 | 0.967 | 0.983 | 0.975 |
Ham | 0.967 | 0.967 | 0.967 | 1.000 | 0.983 | 0.992 | 0.983 | 0.967 | 0.975 | ||
TextRNN | Spam | 0.983 | 0.983 | 0.983 | 0.967 | 0.983 | 0.975 | — | — | — | |
Ham | 0.983 | 0.983 | 0.983 | 0.983 | 0.967 | 0.975 | — | — | — | ||
TextRCNN | Spam | 0.952 | 0.983 | 0.967 | 0.968 | 1.000 | 0.984 | 0.968 | 1.000 | 0.984 | |
Ham | 0.983 | 0.950 | 0.966 | 1.000 | 0.967 | 0.983 | 1.000 | 0.967 | 0.983 |
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Spam | Ham | AUC | Spam | Ham | AUC | Spam | Ham | AUC | |||
传统 方法 | NB | Spam | 0.867 | 0.133 | 0.842 | 0.900 | 0.100 | 0.933 | 0.950 | 0.050 | 0.892 |
Ham | 0.183 | 0.817 | 0.033 | 0.967 | 0.167 | 0.833 | |||||
RF | Spam | 0.750 | 0.250 | 0.842 | 1.000 | 0.000 | 0.967 | 0.967 | 0.033 | 0.908 | |
Ham | 0.067 | 0.933 | 0.067 | 0.933 | 0.150 | 0.850 | |||||
SVM | Spam | 0.817 | 0.183 | 0.883 | 0.967 | 0.033 | 0.950 | 0.917 | 0.083 | 0.942 | |
Ham | 0.050 | 0.950 | 0.067 | 0.933 | 0.033 | 0.967 | |||||
LR | Spam | 0.783 | 0.217 | 0.867 | 0.967 | 0.033 | 0.958 | 0.967 | 0.033 | 0.942 | |
Ham | 0.050 | 0.950 | 0.050 | 0.950 | 0.083 | 0.917 | |||||
DT | Spam | 0.717 | 0.283 | 0.792 | 0.933 | 0.067 | 0.950 | 0.917 | 0.083 | 0.833 | |
Ham | 0.133 | 0.867 | 0.033 | 0.967 | 0.250 | 0.750 | |||||
当前 主流 方法 | DPCNN | Spam | 0.917 | 0.033 | 0.942 | 1.000 | 0.050 | 0.975 | 0.883 | 0.067 | 0.908 |
Ham | 0.083 | 0.967 | 0.000 | 0.950 | 0.117 | 0.933 | |||||
BERT | Spam | 0.900 | 0.067 | 0.917 | 0.983 | 0.033 | 0.975 | 0.850 | 0.050 | 0.900 | |
Ham | 0.100 | 0.933 | 0.017 | 0.967 | 0.150 | 0.950 | |||||
TinyBERT | Spam | 0.933 | 0.100 | 0.917 | 0.967 | 0.033 | 0.967 | 0.800 | 0.083 | 0.858 | |
Ham | 0.067 | 0.900 | 0.033 | 0.967 | 0.200 | 0.917 | |||||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.033 | 0.967 | 1.000 | 0.000 | 0.992 | 0.983 | 0.017 | 0.975 |
Ham | 0.033 | 0.967 | 0.017 | 0.983 | 0.033 | 0.967 | |||||
TextRNN | Spam | 0.983 | 0.017 | 0.983 | 0.983 | 0.017 | 0.975 | — | — | — | |
Ham | 0.017 | 0.983 | 0.033 | 0.967 | — | — | |||||
TextRCNN | Spam | 0.983 | 0.017 | 0.967 | 1.000 | 0.000 | 0.983 | 1.000 | 0.000 | 0.983 | |
Ham | 0.050 | 0.950 | 0.033 | 0.967 | 0.033 | 0.967 |
Tab. 4 Confusion matrixes and AUC values of traditional methods, current methods, and neural network methods on three datasets
方法 | 分类器 | 类别 | SMS数据集 | Ads数据集 | Email数据集 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Spam | Ham | AUC | Spam | Ham | AUC | Spam | Ham | AUC | |||
传统 方法 | NB | Spam | 0.867 | 0.133 | 0.842 | 0.900 | 0.100 | 0.933 | 0.950 | 0.050 | 0.892 |
Ham | 0.183 | 0.817 | 0.033 | 0.967 | 0.167 | 0.833 | |||||
RF | Spam | 0.750 | 0.250 | 0.842 | 1.000 | 0.000 | 0.967 | 0.967 | 0.033 | 0.908 | |
Ham | 0.067 | 0.933 | 0.067 | 0.933 | 0.150 | 0.850 | |||||
SVM | Spam | 0.817 | 0.183 | 0.883 | 0.967 | 0.033 | 0.950 | 0.917 | 0.083 | 0.942 | |
Ham | 0.050 | 0.950 | 0.067 | 0.933 | 0.033 | 0.967 | |||||
LR | Spam | 0.783 | 0.217 | 0.867 | 0.967 | 0.033 | 0.958 | 0.967 | 0.033 | 0.942 | |
Ham | 0.050 | 0.950 | 0.050 | 0.950 | 0.083 | 0.917 | |||||
DT | Spam | 0.717 | 0.283 | 0.792 | 0.933 | 0.067 | 0.950 | 0.917 | 0.083 | 0.833 | |
Ham | 0.133 | 0.867 | 0.033 | 0.967 | 0.250 | 0.750 | |||||
当前 主流 方法 | DPCNN | Spam | 0.917 | 0.033 | 0.942 | 1.000 | 0.050 | 0.975 | 0.883 | 0.067 | 0.908 |
Ham | 0.083 | 0.967 | 0.000 | 0.950 | 0.117 | 0.933 | |||||
BERT | Spam | 0.900 | 0.067 | 0.917 | 0.983 | 0.033 | 0.975 | 0.850 | 0.050 | 0.900 | |
Ham | 0.100 | 0.933 | 0.017 | 0.967 | 0.150 | 0.950 | |||||
TinyBERT | Spam | 0.933 | 0.100 | 0.917 | 0.967 | 0.033 | 0.967 | 0.800 | 0.083 | 0.858 | |
Ham | 0.067 | 0.900 | 0.033 | 0.967 | 0.200 | 0.917 | |||||
神经 网络 方法 | TextCNN | Spam | 0.967 | 0.033 | 0.967 | 1.000 | 0.000 | 0.992 | 0.983 | 0.017 | 0.975 |
Ham | 0.033 | 0.967 | 0.017 | 0.983 | 0.033 | 0.967 | |||||
TextRNN | Spam | 0.983 | 0.017 | 0.983 | 0.983 | 0.017 | 0.975 | — | — | — | |
Ham | 0.017 | 0.983 | 0.033 | 0.967 | — | — | |||||
TextRCNN | Spam | 0.983 | 0.017 | 0.967 | 1.000 | 0.000 | 0.983 | 1.000 | 0.000 | 0.983 | |
Ham | 0.050 | 0.950 | 0.033 | 0.967 | 0.033 | 0.967 |
方法 | 分类器 | 运行时间 | ||
---|---|---|---|---|
SMS数据集 | Ads数据集 | Email数据集 | ||
当前 主流 方法 | DPCNN | 3.4 | 37.1 | 1 098.1 |
BERT | 61.6 | 358.1 | 320.0 | |
Tiny-BERT | 17.8 | 77.7 | 78.1 | |
神经 网络 方法 | TextCNN | 2.2 | 6.0 | 138.4 |
TextRNN | 1.9 | 11.3 | — | |
TextRCNN | 1.9 | 12.7 | 869.0 |
Tab. 5 Running times of traditional methods, current methods and neural network methods on three datasets
方法 | 分类器 | 运行时间 | ||
---|---|---|---|---|
SMS数据集 | Ads数据集 | Email数据集 | ||
当前 主流 方法 | DPCNN | 3.4 | 37.1 | 1 098.1 |
BERT | 61.6 | 358.1 | 320.0 | |
Tiny-BERT | 17.8 | 77.7 | 78.1 | |
神经 网络 方法 | TextCNN | 2.2 | 6.0 | 138.4 |
TextRNN | 1.9 | 11.3 | — | |
TextRCNN | 1.9 | 12.7 | 869.0 |
1 | EL-ALFY E-S M, ALHASAN A A. Spam filtering framework for multimodal mobile communication based on dendritic cell algorithm [J]. Future Generation Computer Systems, 2016, 64: 98-107. 10.1016/j.future.2016.02.018 |
2 | FERNANDES D, COSTA K A P D, ALMEIDA T A, et al. SMS spam filtering through optimum-path forest-based classifiers[C]// Proceedings of the 2015 International Conference on Machine Learning and Applications. Piscataway: IEEE, 2015: 133-137. 10.1109/icmla.2015.71 |
3 | RAZAK M F AB, ANUAR N B, SALLEH R, et al. The rise of “malware”: Bibliometric analysis of malware study[J]. Journal of Network and Computer Applications, 2016, 75: 58-76. 10.1016/j.jnca.2016.08.022 |
4 | ALMEIDA T, HIDALGO J M G, SILVA T P. Towards SMS spam filtering: results under a new dataset[J]. International Journal of Information Security Science, 2013, 2(1): 1-18. |
5 | JUNAID M B, FAROOQ M. Using evolutionary learning classifiers to do MobileSpam (SMS) filtering [C]// Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. New York: ACM, 2011: 1795-1802. 10.1145/2001576.2001817 |
6 | SILVA R M, ALMEIDA T A, YAMAKAMI A. MDLText: an efficient and lightweight text classifier [J]. Knowledge-Based Systems, 2017, 118: 152-164. 10.1016/j.knosys.2016.11.018 |
7 | ADEWOLE K S, ANUAR N B, KAMSIN A, et al. SMSAD: a framework for spam message and spam account detection [J]. Kluwer Academic Publishers, 2019, 78(4): 3925-3960. 10.1007/s11042-017-5018-x |
8 | BOUJNOUNI M E. SMS spam filtering using N-gram method, information gain metric and an improved version of SVDD classifier [J]. Journal of Engineering Science and Technology Review, 2017, 10(1): 131-137. 10.25103/jestr.101.18 |
9 | RUANO-ORDÁS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. RuleSIM: a toolkit for simulating the operation and improving throughput of rule-based spam filters [J]. Software Practice & Experience, 2016,46(8): 1091-1108. 10.1002/spe.2342 |
10 | 郝苗苗, 徐秀娟, 于红, 等. 基于中文微博的情绪分类与预测算法 [J]. 计算机应用, 2018, 38(S2): 89-96. |
HAO M M, XU X J, YU H, et al. Emotion classification and prediction algorithm based on Chinese microblog [J]. Journal of Computer Applications, 2018, 38(S2): 89-96. | |
11 | 焦庆争, 蔚承建. 分布权值调节概率标准差的文本分类方法[J]. 计算机应用, 2009, 29(12): 3303-3306. 10.3724/sp.j.1087.2009.03303 |
JIAO Q Z, WEI C J. Text categorization approach based on probability standard deviation with evaluation of distribution information [J]. Journal of Computer Applications, 2009, 29(12): 3303-3306. 10.3724/sp.j.1087.2009.03303 | |
12 | RUANO-ORDAS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks [J]. Journal of Systems & Software, 2013, 86(12): 3151-3161. 10.1016/j.jss.2013.07.036 |
13 | RUANO-ORDÁS D, FDEZ-GLEZ J, FDEZ-RIVEROLA F, et al. Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems [J]. Software Practice & Experience, 2016, 46(8): 1035-1051. 10.1002/spe.2343 |
14 | WU C-H. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks [J]. Expert Systems with Applications, 2009, 36(3): 4321-4330. 10.1016/j.eswa.2008.03.002 |
15 | LUO Q, LIU B, YAN J, et al. Design and implement a rule-based spam filtering system using neural network [C]// Proceedings of the 2011 International Conference on Computational and Information Sciences. Piscataway: IEEE, 2011: 398-401. 10.1109/iccis.2011.125 |
16 | CUTLER A, CUTLER D R, STEVENS J R. Random forests [J]. Machine Learning, 2011, 45: 157-176. 10.1007/978-1-4419-9326-7_5 |
17 | CORTES C, VAPNIK V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273-297. 10.1007/bf00994018 |
18 | 高秀梅, 陈芳, 宋枫溪, 等. 特征权对贝叶斯分类器文本分类性能的影响[J]. 计算机应用, 2008, 28(12): 3080-3083. 10.3724/sp.j.1087.2008.03080 |
GAO X M, CHEN F, SONG F X, et al. Influence of feature weight on text categorization performance of Bayesian classifier [J]. Journal of Computer Applications,2008,28(12):3080-3083. 10.3724/sp.j.1087.2008.03080 | |
19 | 董才正, 刘柏嵩. 面向问答社区的中文问题分类[J]. 计算机应用, 2016, 36(4): 1060-1065. 10.11772/j.issn.1001-9081.2016.04.1060 |
DONG C Z, LIU B S. Community question answering-oriented Chinese question classification [J]. Journal of Computer Applications, 2016, 36(4): 1060-1065. 10.11772/j.issn.1001-9081.2016.04.1060 | |
20 | MARTINEAU J, FININ T. Delta TFIDF: an improved feature space for sentiment analysis [C]// Proceedings of the 2009 International Conference on Weblogs and Social Media. Palo Alto, CA: AAAI, 2009:258-261. 10.1109/cse.2009.584 |
21 | KIM Y. Convolutional neural networks for sentence classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. 10.3115/v1/d14-1181 |
22 | LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2016: 2873-2879. |
23 | LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification [C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI, 2015: 2267-2273. 10.1609/aaai.v33i01.33017370 |
24 | LUONG T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 1412-1421. 10.18653/v1/d15-1166 |
25 | YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2016: 1480-1489. 10.18653/v1/n16-1174 |
26 | JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 562-570. 10.18653/v1/p17-1052 |
27 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL].[2020-06-22] . 10.3126/jiee.v3i1.34327 |
28 | PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1532-1543. 10.3115/v1/d14-1162 |
29 | PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2227-2237. 10.18653/v1/n18-1202 |
30 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. 10.18653/v1/n19-1423 |
31 | JIAO X, YIN Y, SHANG L, et al. TinyBERT: Distilling BERT for natural language understanding[C]// Proceedings of the 2020 Findings of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2020: 4163-4174. 10.18653/v1/2020.findings-emnlp.372 |
32 | DUA D, GRAFF C. UCI machine learning repository [D]. Irvine, California: University of California, 2017. |
33 | PISHARADY P K, VADAKKEPAT P, POH L A. Hand posture and face recognition using fuzzy-rough approach[C]// Computational Intelligence in Multi-Feature Visual Pattern Recognition. Singapore: Springer, 2014: 63-80. 10.1007/978-981-287-056-8_5 |
34 | BOUJON C. Playing with text classified ads [EB/OL]. [2020-07-24]. . |
35 | KROONBERG M VAN DEN, VEST B.Index of /old/publiccorpus [EB/OL]. [2020-07-21]. . |
36 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. 10.1162/neco.1997.9.8.1735 |
37 | TAGG C. A corpus linguistics study of SMS text messaging [D]. Birmingham: University of Birmingham, 2009. |
38 | 杨国峰, 杨勇. 基于BERT的常见作物病害问答系统问句分类[J]. 计算机应用, 2020, 40(6): 1580-1586. 10.1109/wacv45572.2020.9093596 |
YANG G F, YANG Y. Question classification of common crop disease question answering system based on BERT [J]. Journal of Computer Applications, 2020, 40(6): 1580-1586. 10.1109/wacv45572.2020.9093596 | |
39 | YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding [C]// Proceedings of the 2019 Advances in Neural Information Processing Systems. New York: Curran Associates, 2019, 32: 5754-5764. 10.1016/j.ymssp.2019.106289 |
40 | NURUZZAMAN M T, LEE C, CHOI D. Independent and personal SMS spam filtering [C]// Proceedings of the 2011 International Conference on Computer and Information Technology. Piscataway: IEEE, 2011: 429-435. 10.1109/cit.2011.23 |
[1] | Yongkang HUANG, Meiyu LIANG, Xiaoxiao WANG, Zheng CHEN, Xiaowen CAO. Multi-person classroom action recognition in classroom teaching videos based on deep spatiotemporal residual convolution neural network [J]. Journal of Computer Applications, 2022, 42(3): 736-742. |
[2] | Lu ZHANG, Chun FANG, Ming ZHU. Indoor fall detection algorithm based on Res2Net-YOLACT and fusion feature [J]. Journal of Computer Applications, 2022, 42(3): 757-763. |
[3] | Tingxiu CHEN, Jianqin YIN. Audio visual joint action recognition based on key frame selection network [J]. Journal of Computer Applications, 2022, 42(3): 731-735. |
[4] | Renzhi PAN, Fulan QIAN, Shu ZHAO, Yanping ZHANG. Recommendation model for user attribute preference modeling based on convolutional neural network interaction [J]. Journal of Computer Applications, 2022, 42(2): 404-411. |
[5] | Yaoming MA, Yu ZHANG. Insulator detection algorithm based on improved Faster-RCNN [J]. Journal of Computer Applications, 2022, 42(2): 631-637. |
[6] | Xinyu CHEN, Mingzhe LIU, Jun REN, Ying TANG. Parameter asynchronous updating algorithm based on multi-column convolutional neural network [J]. Journal of Computer Applications, 2022, 42(2): 395-403. |
[7] | Kun FU, Jinhui GAO, Xiaomeng ZHAO, Jianing LI. Topology optimization based graph convolutional network combining with global structural information [J]. Journal of Computer Applications, 2022, 42(2): 357-364. |
[8] | Yuxi LIU, Yuqi LIU, Zonglin ZHANG, Zhihua WEI, Ran MIAO. News recommendation model with deep feature fusion injecting attention mechanism [J]. Journal of Computer Applications, 2022, 42(2): 426-432. |
[9] | Quan CHEN, Li LI, Yongle CHEN, Yuexing DUAN. Adversarial attack algorithm for deep learning interpretability [J]. Journal of Computer Applications, 2022, 42(2): 510-518. |
[10] | Wei LI, Yaochi FAN, Qiaoyong JIANG, Lei WANG, Qingzheng XU. Variable convolutional autoencoder method based on teaching-learning-based optimization for medical image classification [J]. Journal of Computer Applications, 2022, 42(2): 592-598. |
[11] | Yinxin BAO, Yang CAO, Quan SHI. Improved spatio-temporal residual convolutional neural network for urban road network short-term traffic flow prediction [J]. Journal of Computer Applications, 2022, 42(1): 258-264. |
[12] | Xueqiang LYU, Chen PENG, Le ZHANG, Zhi’an DONG, Xindong YOU. Text multi-label classification method incorporating BERT and label semantic attention [J]. Journal of Computer Applications, 2022, 42(1): 57-63. |
[13] | Jialiang DUAN, Guoming CAI, Kaiyong XU. Memory combined feature classification method based on multiple BP neural networks [J]. Journal of Computer Applications, 2022, 42(1): 178-182. |
[14] | Hengxin LI, Kan CHANG, Yufei TAN, Mingyang LING, Tuanfa QIN. Color image demosaicking network based on inter-channel correlation and enhanced information distillation [J]. Journal of Computer Applications, 2022, 42(1): 245-251. |
[15] | Huiqing XU, Bin CHEN, Jingfei WANG, Zhiyi CHEN, Jian QIN. Elongated pavement distress detection method based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(1): 265-272. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||