Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 242-251.DOI: 10.11772/j.issn.1001-9081.2023010031

• Cyber security • Previous Articles    

User plagiarism identification scheme in social network under blockchain

Li LI1(), Chunyan YANG2, Jiangwen ZHU2, Ronglei HU1   

  1. 1.College of Electronic and Communication Engineering,Beijing Electronic Science and Technology Institute,Beijing 100071,China
    2.College of Computer Science and Technology,Xidian University,Xi’an Shaanxi 710071,China
  • Received:2023-01-15 Revised:2023-04-28 Accepted:2023-05-12 Online:2023-06-06 Published:2024-01-10
  • Contact: Li LI
  • About author:YANG Chunyan, born in 1998, M. S. candidate. Her research interests include information security, blockchain.
    ZHU Jiangwen, born in 1997, M. S. candidate. His research interests include cryptography, information security.
    HU Ronglei, born in 1977, Ph. D., associate research fellow. His research interests include network information security, blockchain security.
  • Supported by:
    Fundamental Research Funds for Central Universities(3282023017)

区块链下社交网络用户抄袭识别方案

李莉1(), 杨春艳2, 朱江文2, 胡荣磊1   

  1. 1.北京电子科技学院 电子与通信工程系, 北京 100071
    2.西安电子科技大学 计算机科学与技术学院, 西安 710071
  • 通讯作者: 李莉
  • 作者简介:杨春艳(1998—),女,河南周口人,硕士研究生,主要研究方向:信息安全、区块链;
    朱江文(1997—),男,安徽安庆人,硕士研究生,主要研究方向:密码学、信息安全;
    胡荣磊(1977—),男,河北衡水人,副研究员,博士,主要研究方向:网络信息安全、区块链安全。
    第一联系人:李莉(1974—),女,山东青岛人,教授,博士,主要研究方向:网络与系统安全、嵌入式安全;
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(3282023017)

Abstract:

To address the problem of difficulty in identifying user plagiarism in social networks and to protect the rights of original authors while holding users accountable for plagiarism actions, a plagiarism identification scheme for social network users under blockchain was proposed. Aiming at the lack of universal tracing model in existing blockchain, a blockchain-based traceability information management model was designed to record user operation information and provide a basis for text similarity detection. Based on the Merkle tree and Bloom filter structures, a new index structure BHMerkle was designed. The calculation overhead of block construction and query was reduced, and the rapid positioning of transactions was realized. At the same time, a multi-feature weighted Simhash algorithm was proposed to improve the precision of word weight calculation and the efficiency of signature value matching stage. In this way, malicious users with plagiarism cloud be identified, and the occurrence of malicious behavior can be curbed through the reward and punishment mechanism. The average precision and recall of the plagiarism detection scheme on news datasets with different topics were 94.8% and 88.3%, respectively. Compared with multi-dimensional Simhash algorithm and Simhash algorithm based on information Entropy weighting (E-Simhash), the average precision was increased by 6.19 and 4.01 percentage points respectively, the average recall was increased by 3.12 and 2.92 percentage points respectively. Experimental results show that the proposed scheme improves the query and detection efficiency of plagiarism text, and has high accuracy in plagiarism identification.

Key words: blockchain, plagiarism identification, Simhash algorithm, similarity detection, social network

摘要:

针对社交网络中用户抄袭难以识别的问题,为保障原创作者权益并对具有抄袭行为的用户进行追责,提出了区块链下社交网络用户抄袭识别方案。针对现有区块链缺少通用溯源模型的问题,设计基于区块链的溯源信息管理模型来记录用户操作信息,为文本相似度检测提供依据。在Merkle树和布隆过滤器结构的基础上,设计了新的索引结构BHMerkle,减少了区块构建和查询时的计算开销,实现了对交易的快速定位。同时提出多特征权重Simhash算法,提高了词权计算的准确性并提高签名值匹配阶段的效率,从而对具有抄袭行为的恶意用户进行识别,并通过奖惩机制遏制恶意行为的发生。抄袭识别方案在不同主题的新闻数据集上的平均准确率为94.8%,平均召回率为88.3%,相较于多维度Simhash算法和基于信息熵加权的Simhash(E-Simhash)算法,平均准确率分别提升了6.19、4.01个百分点,平均召回率分别提升了3.12、2.92个百分点。实验结果表明,所提方案在抄袭文本的查询及检测效率方面均有所提升,且在抄袭识别方面具有较高的准确性。

关键词: 区块链, 抄袭识别, Simhash算法, 相似度检测, 社交网络

CLC Number: