Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 473-478.DOI: 10.11772/j.issn.1001-9081.2019101768
• CCF NDBC 2019 • Previous Articles Next Articles
Yang LI1, Wei ZHANG1(), Chen PENG2
Received:
2019-09-18
Revised:
2019-10-18
Accepted:
2019-10-24
Online:
2019-10-31
Published:
2020-02-10
Contact:
Wei ZHANG
About author:
LI Yang, born in 1994, M. S. candidate. His research interests include data mining.Supported by:
通讯作者:
张伟
作者简介:
李扬(1994—),男,山西运城人,硕士研究生,主要研究方向:数据挖掘基金资助:
CLC Number:
Yang LI, Wei ZHANG, Chen PENG. Target-dependent method for authorship attribution[J]. Journal of Computer Applications, 2020, 40(2): 473-478.
李扬, 张伟, 彭晨. 目标依赖的作者身份识别方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 473-478.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019101768
符号 | 描述 | 符号 | 描述 |
---|---|---|---|
文本最大长度 | 词向量表 | ||
卷积核的数目 | 文档向量 | ||
激活函数 | 商品ID向量表 | ||
二维卷积操作 | 商品ID向量 | ||
词向量维度 |
Tab. 1 Symbol definition
符号 | 描述 | 符号 | 描述 |
---|---|---|---|
文本最大长度 | 词向量表 | ||
卷积核的数目 | 文档向量 | ||
激活函数 | 商品ID向量表 | ||
二维卷积操作 | 商品ID向量 | ||
词向量维度 |
数据集 | 商品数量 | 用户数量 | 评论数/用户 | 评论数/商品 | 总评论数 |
---|---|---|---|---|---|
电影评论 | 250 | 610 | 37.37 | 91.17 | 22 793 |
CD评论 | 600 | 800 | 51.27 | 38.45 | 30 763 |
Tab. 2 Dataset statistics
数据集 | 商品数量 | 用户数量 | 评论数/用户 | 评论数/商品 | 总评论数 |
---|---|---|---|---|---|
电影评论 | 250 | 610 | 37.37 | 91.17 | 22 793 |
CD评论 | 600 | 800 | 51.27 | 38.45 | 30 763 |
名称 | 层数 | 数值 |
---|---|---|
最大长度L | — | 1 000 |
向量维度d | — | 300 |
卷积 | 3 | |
全连接 | 1 | # of classes |
Tab. 3 Neural network architecture and hyperparameters
名称 | 层数 | 数值 |
---|---|---|
最大长度L | — | 1 000 |
向量维度d | — | 300 |
卷积 | 3 | |
全连接 | 1 | # of classes |
方法 | 电影评论数据集 | CD评论数据集 | ||||
---|---|---|---|---|---|---|
Acc | Rmacro | F1macro | Acc | Rmacro | F1macro | |
CNN-2 | 0.519 | 0.411 | 0.415 | 0.683 | 0.581 | 0.579 |
LSTM-1 | 0.363 | 0.262 | 0.259 | 0.464 | 0.362 | 0.363 |
SVM | 0.452 | 0.354 | 0.351 | 0.619 | 0.523 | 0.521 |
RF | 0.307 | 0.209 | 0.205 | 0.492 | 0.401 | 0.399 |
Syntax-CNN | 0.505 | 0.401 | 0.405 | 0.656 | 0.566 | 0.565 |
LDA-S | 0.285 | 0.188 | 0.186 | 0.349 | 0.251 | 0.252 |
CNN product | 0.018 | 0.006 | 0.003 | 0.012 | 0.003 | 0.004 |
前期融合 | 0.556 | 0.449 | 0.443 | 0.708 | 0.612 | 0.608 |
后期融合 | 0.569 | 0.467 | 0.465 | 0.725 | 0.621 | 0.622 |
Tab. 4 Comparison of evaluation results of different methods on two datasets
方法 | 电影评论数据集 | CD评论数据集 | ||||
---|---|---|---|---|---|---|
Acc | Rmacro | F1macro | Acc | Rmacro | F1macro | |
CNN-2 | 0.519 | 0.411 | 0.415 | 0.683 | 0.581 | 0.579 |
LSTM-1 | 0.363 | 0.262 | 0.259 | 0.464 | 0.362 | 0.363 |
SVM | 0.452 | 0.354 | 0.351 | 0.619 | 0.523 | 0.521 |
RF | 0.307 | 0.209 | 0.205 | 0.492 | 0.401 | 0.399 |
Syntax-CNN | 0.505 | 0.401 | 0.405 | 0.656 | 0.566 | 0.565 |
LDA-S | 0.285 | 0.188 | 0.186 | 0.349 | 0.251 | 0.252 |
CNN product | 0.018 | 0.006 | 0.003 | 0.012 | 0.003 | 0.004 |
前期融合 | 0.556 | 0.449 | 0.443 | 0.708 | 0.612 | 0.608 |
后期融合 | 0.569 | 0.467 | 0.465 | 0.725 | 0.621 | 0.622 |
方法 | 电影评论 | CD评论 |
---|---|---|
CNN-2 | 0.519 | 0.682 |
前期融合 | 0.522 | 0.686 |
后期融合 | 0.540 | 0.706 |
Tab. 5 Impact of target-dependence information on Acc based on n-gram feature
方法 | 电影评论 | CD评论 |
---|---|---|
CNN-2 | 0.519 | 0.682 |
前期融合 | 0.522 | 0.686 |
后期融合 | 0.540 | 0.706 |
方法 | 电影评论 | CD评论 |
---|---|---|
CNN-2 | 0.548 | 0.703 |
前期融合 | 0.554 | 0.710 |
后期融合 | 0.568 | 0.725 |
Tab. 6 Impact of target-dependence information on Acc based on pre-trained feature
方法 | 电影评论 | CD评论 |
---|---|---|
CNN-2 | 0.548 | 0.703 |
前期融合 | 0.554 | 0.710 |
后期融合 | 0.568 | 0.725 |
1 | SCHWARTZ R, TSUR O, RAPPOPORT A, et al. Authorship attribution of micro-messages[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2013: 1880-1891. |
2 | LAYTON R, WATTERS P, DAZELEY R. Authorship attribution for twitter in 140 characters or less[C]// Proceedings of the 2nd Cybercrime and Trustworthy Computing Workshop. Piscataway: IEEE, 2010: 1-8. 10.1109/ctc.2010.17 |
3 | KOPPEL M, SCHLER J. Authorship verification as a one-class classification problem[C]// Proceedings of the 21st International Conference on Machine Learning. New York: ACM, 2004: 1-7. 10.1145/1015330.1015448 |
4 | TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 1422-1432. 10.18653/v1/d15-1167 |
5 | TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[EB/OL]. [2019-02-20]. . 10.3115/v1/p15-1150 |
6 | KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1746-1751. 10.3115/v1/d14-1181 |
7 | ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 649-657. 10.1109/icip.2015.7351229 |
8 | ZHANG W, YUAN Q, HAN J, et al. Collaborative multi-Level embedding learning from reviews for rating prediction[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2016: 2986-2992. 10.1609/aaai.v34i04.5826 |
9 | ZHANG W, WANG J. Integrating topic and latent factors for scalable personalized review-based rating prediction[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(11): 3013-3027. 10.1109/tkde.2016.2598740 |
10 | SEROUSSI Y, ZUKERMAN I, BOHNERT F. Authorship attribution with latent Dirichlet allocation[C]// Proceedings of the 15th Conference on Computational Natural Language Learning. Stroudsburg, PA: Association for Computational Linguistics, 2011: 181-189. 10.1145/1995966.1995976 |
11 | ZHANG R, HU Z, GUO H, et al. Syntax encoding with application in authorship attribution[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018: 2742-2753. 10.18653/v1/d18-1294 |
12 | MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2013: 3111-3119. |
13 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2019-02-20]. . 10.18653/v1/n19-1423 |
14 | ATREY P K, HOSSAIN M A, SADDIK A EL, et al. Multimodal fusion for multimedia analysis: a survey[J]. Multimedia Systems, 2010, 16(6): 345-379. 10.1007/s00530-010-0182-0 |
15 | SHRESTHA P, SIERRA S, GONZÁLEZ F, et al. Convolutional neural networks for authorship attribution of short texts[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2017: 669-674. 10.18653/v1/e17-2106 |
16 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. [2019-02-20]. . |
17 | LI Y, YE J. Learning adversarial networks for semi-supervised text classification via policy gradient[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2018: 1715-1723. 10.1145/3219819.3219956 |
18 | SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[C]// Proceedings of the 2017 Conference on Neural Information Processing Systems.[S.l.]: CUED Publications database, 2017: 3856-3866. |
19 | ZHAO W, YE J, YANG M, et al. Investigating capsule networks with dynamic routing for text classification[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2018:3110-3119. 10.18653/v1/d18-1350 |
[1] | WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747. |
[2] | SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735. |
[3] | LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509. |
[4] | ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614. |
[5] | ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503. |
[6] | XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725. |
[7] | ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329. |
[8] | CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287. |
[9] | QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337. |
[10] | HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264. |
[11] | YANG Su, OUYANG Zhi, DU Nisuo. Unsupervised parallel hash image retrieval based on correlation distance [J]. Journal of Computer Applications, 2021, 41(7): 1902-1907. |
[12] | TAN Daoqiang, ZENG Cheng, QIAO Jinxia, ZHANG Jun. Shadow detection method based on hybrid attention model [J]. Journal of Computer Applications, 2021, 41(7): 2076-2081. |
[13] | WU Guangli, LI Leiting, GUO Zhenzhou, WANG Chengxiang. Video summarization generation model based on improved bi-directional long short-term memory network [J]. Journal of Computer Applications, 2021, 41(7): 1908-1914. |
[14] | GAO Qinquan, HUANG Bingcheng, LIU Wenzhe, TONG Tong. Bamboo strip surface defect detection method based on improved CenterNet [J]. Journal of Computer Applications, 2021, 41(7): 1933-1938. |
[15] | YAN Junhua, HOU Ping, ZHANG Yin, LYU Xiangyang, MA Yue, WANG Gaofei. Image single distortion type judgment method based on two-channel convolutional neural network [J]. Journal of Computer Applications, 2021, 41(6): 1761-1766. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||