Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 473-478.DOI: 10.11772/j.issn.1001-9081.2019101768

• CCF NDBC 2019 • Previous Articles     Next Articles

Target-dependent method for authorship attribution

Yang LI1, Wei ZHANG1(), Chen PENG2   

  1. 1.School of Computer Science and Technology,East China Normal University,Shanghai 200062,China
    2.Institute of Electronics,Chinese Academy of Sciences,Suzhou Jiangsu 215123,China
  • Received:2019-09-18 Revised:2019-10-18 Accepted:2019-10-24 Online:2019-10-31 Published:2020-02-10
  • Contact: Wei ZHANG
  • About author:LI Yang, born in 1994, M. S. candidate. His research interests include data mining.
    PENG Chen, born in 1986, Ph. D., associate research fellow. His research interests include geospatial information processing.
  • Supported by:
    the Young Scientists Fund of the National Natural Science Foundation of China(61702190)

目标依赖的作者身份识别方法

李扬1, 张伟1(), 彭晨2   

  1. 1.华东师范大学 计算机科学与技术学院,上海 200062
    2.中国科学院电子学研究所苏州研究院,江苏 苏州 215123
  • 通讯作者: 张伟
  • 作者简介:李扬(1994—),男,山西运城人,硕士研究生,主要研究方向:数据挖掘
    彭晨(1986—),男,江苏常州人,副研究员,博士,主要研究方向:空间信息处理。
  • 基金资助:
    国家自然科学基金青年基金资助项目(61702190)

Abstract:

Authorship attribution is the task of deciding who is the author of a particular document, however, the traditional methods for authorship attribution are target-independent without considering any constraint during the prediction of authorship, which is inconsistent with the actual problems. To address the above issue, a Target-Dependent method for Authorship Attribution (TDAA) was proposed. Firstly, the product ID corresponding to the user review was chosen to be the constraint information. Secondly, Bidirectional Encoder Representation from Transformer (BERT) was used to extract the pre-trained review text feature to make the text modeling process more universal. Thirdly, the Convolutional Neural Network (CNN) was used to extract the deep features of the text. Finally, two fusion methods were proposed to fuse the two different information. Experimental results on Amazon Movie_and_TV dataset and CDs_and_Vinyl_5 dataset show that the proposed method can increase the accuracy by 4%-5% compared with the comparison methods.

Key words: authorship attribution, target-dependent, Convolutional Neural Network (CNN), information fusion, pre-trained language model

摘要:

作者身份识别任务旨在判断一篇文档的作者,但目前已有的作者身份识别方法都是目标独立的,意味着这些方法在预测作者身份时假设没有任何限定条件,这与实际情况不相符合。为了解决限定条件下的作者身份识别问题,提出了一种目标依赖的作者身份识别方法TDAA。首先,使用用户评论对应的商品ID作为限定信息;其次,为了使文本建模过程更加具有普适性,使用BERT提取预训练的评论文本特征;然后,使用卷积神经网络(CNN)进行深层次的文本特征提取;最后,为了将两种不同的信息融合起来,讨论了两种不同的融合方式。在亚马逊电影评论(Amazon Movie_and_TV)和CD评论(CDs_and_Vinyl_5)两个数据集上的实验结果表明,所提出的方法在精确率评价指标上较对比方法提高了4%~5%。

关键词: 作者身份识别, 目标依赖, 卷积神经网络, 信息融合, 预训练语言模型

CLC Number: