计算机应用 ›› 2017, Vol. 37 ›› Issue (8): 2374-2380.DOI: 10.11772/j.issn.1001-9081.2017.08.2374

• 数据科学与技术 • 上一篇    下一篇

基于信息熵的跨社交网络用户身份识别方法

吴铮, 于洪涛, 刘树新, 朱宇航   

  1. 国家数字交换系统工程技术研究中心, 郑州 450002
  • 收稿日期:2017-02-08 修回日期:2017-03-15 出版日期:2017-08-10 发布日期:2017-08-12
  • 作者简介:吴铮(1992-),男,江苏徐州人,硕士研究生,主要研究方向:大数据分析、复杂网络;于洪涛(1970-),男,辽宁丹东人,研究员,博士,主要研究方向:大数据分析、通信与信息系统;刘树新(1987-),男,山东潍坊人,助理研究员,博士,主要研究方向:复杂网络、链路预测;朱宇航(1982-),男,江苏徐州人,助理研究员,硕士,主要研究方向:图挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61379151);国家科技支撑计划项目(2014BAH30B01)。

User identification across multiple social networks based on information entropy

WU Zheng, YU Hongtao, LIU Shuxin, ZHU Yuhang   

  1. National Digital Switching Engineering & Technological Research Center, Zhengzhou Henan 450002, China
  • Received:2017-02-08 Revised:2017-03-15 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61379151),the National Key Technology R&D Program (2014BAH30B01).

摘要: 针对主观分配属性项权重的方法忽视了各属性项在身份匹配的应用领域中具有的特殊含义与作用,导致识别准确率低的问题,提出了一种基于信息熵的跨网络用户身份识别算法(IE-MSNUIA)。首先,该算法分析不同属性项的数据类型及物理含义,相应地采用不同的相似度计算方法;然后根据各属性的信息熵值赋予权值,进而充分挖掘各属性的潜在信息;最后融合各个属性进行决策判定账号是否匹配。理论分析和实验结果表明,与机器学习算法和主观赋权算法相比,所提算法的各个性能参数值均有所提升,在不同数据集上的平均准确率可以达到97.2%,平均召回率达到94.1%,平均综合性能值达到95.6%,可以准确地识别出用户在不同社交网络中的多个账号身份。

关键词: 用户身份识别, 属性相似度, 信息熵, 信息融合, 在线社交网络

Abstract: The precision of user identification is low since the subjective weighting algorithms ignore the special meanings and effects of attributes in applications. To solve this problem, an Information Entropy based Multiple Social Networks User Identification Algorithm (IE-MSNUIA) was proposed. Firstly, the data types and physical meanings of different attributes were analyzed, then different similarity calculation methods were correspondingly adopted. Secondly, the weights of attributes were determined according to their information entropies, thus the potential information of each attribute could be fully exploited. Finally, all chosen attributes were integrated to determine whether the account pair was the matched one. Theoretical analysis and experimental results show that, compared with machine learning based algorithms and subjective weighting algorithms, the performance of the proposed algorithm is improved, on different datasets, the average precision of it is up to 97.2%, the average recall of it is up to 94.1%, and the average comprehensive evaluation metric of it is up to 95.6%. The proposed algorithm can accurately identify user accounts across multiple social networks.

Key words: user identification, attribute similarity, information entropy, information integration, online social network

中图分类号: