计算机应用 ›› 2014, Vol. 34 ›› Issue (9): 2604-2607.DOI: 10.11772/j.issn.1001-9081.2014.09.2604

• 数据技术 • 上一篇    下一篇

异构信息网中基于元路径的动态相似性搜索

陈湘涛,丁平尖,王晶   

  1. 湖南大学 信息科学与工程学院,长沙 410082
  • 收稿日期:2014-03-21 修回日期:2014-05-16 出版日期:2014-09-01 发布日期:2014-09-30
  • 通讯作者: 丁平尖
  • 作者简介: 
    陈湘涛(1974-),男,湖南邵阳人,副教授,博士, CCF会员,主要研究方向:数据挖掘;
    丁平尖(1990-),男,湖南衡阳人,硕士研究生,主要研究方向:数据挖掘;
    王晶(1989-),女,湖南邵阳人,硕士研究生,主要研究方向:数据挖掘。

Meta path-based dynamic similarity search in heterogeneous information network

CHEN Xiangtao,DING Pingjian,WANG Jing   

  1. College of Computer Science and Electronic Engineering, Hunan University, Changsha Hunan 410082, China
  • Received:2014-03-21 Revised:2014-05-16 Online:2014-09-01 Published:2014-09-30
  • Contact: DING Pingjian

摘要:

现有的相似性搜索算法通常没有考虑时间因素,为此,提出一种异构信息网中基于元路径的动态相似性搜索算法PDSim。PDSim算法首先计算给定元路径下实体的链接矩阵,得到实体之间的元路径实例数比值,同时基于建立时间的不同,计算其时间差异度;在此基础上针对给定的元路径,获得异构信息网中动态相似性的度量。在多个相似性搜索实例中,PDSim能够捕获到实体随时间变化而产生的兴趣的变化;应用于聚类时,相对于PathSim和PCRW方法,其标准互信息聚类精度可以提高0.17%~9.24%。实验结果表明,PDSim方法与传统的基于链接的相似性搜索算法相比,显著提高了异构信息网中动态相似性搜索的效率和用户满意度,是一种研究实体随时间而发生动态变化的相似性搜索方法。

Abstract:

The existing similarity search algorithms do not consider the time factor. To address this problem, a meta path-based dynamic similarity search algorithm named PDSim was proposed for the heterogeneous information network. Firstly, PDSim calculated the link matrix of object under the given meta-path, thus obtained the instances ratio of meta-path between different objects. Meanwhile, the differences of establishing time were calculated. Finally, the dynamic similarity was measured under the given meta-path. In multiple instances of the similarity search, PDSim kept up with the interest variation of object which dynamically changed with time. Compared with the PathSim (Meta Path-Based Similarity) and PCRW (Path-Constrained Random Walks) methods, the clustering accuracy of Normalized Mutual Information (NMI) could be increased by 0.17% to 9.24% when applied to clustering. The experimental results show that, compared to the traditional similarity search algorithm based on link, the efficiency of dynamic similarity search and the satisfaction of user of PDSim are significantly improved, and it is a dynamic similarity search algorithm for object changes with time.

中图分类号: