计算机应用 ›› 2018, Vol. 38 ›› Issue (2): 471-477.DOI: 10.11772/j.issn.1001-9081.2017082360

• 数据科学与技术 • 上一篇    下一篇

大规模标签图中的动态Top-K兴趣子图查询

宋宝燕, 贾春杰, 单晓欢, 丁琳琳, 丁兴艳   

  1. 辽宁大学 信息学院, 沈阳 110036
  • 收稿日期:2017-08-28 修回日期:2017-09-28 出版日期:2018-02-10 发布日期:2018-02-10
  • 通讯作者: 单晓欢
  • 作者简介:宋宝燕(1965-),女,辽宁开原人,教授,博士,CCF高级会员,主要研究方向:数据库、RFID事件流处理、大数据管理、图数据管理;贾春杰(1991-),女,河北廊坊人,硕士研究生,主要研究方向:图数据管理;单晓欢(1987-),女,辽宁沈阳人,助理实验师,博士研究生,主要研究方向:数据库、图数据管理;丁琳琳(1983-),女,辽宁阜新人,副教授,博士,CCF会员,主要研方向:大数据管理、分布式数据管理、不确定数据管理;丁兴艳(1991-),女,贵州贵阳人,硕士研究生,主要研究方向:图数据管理。
  • 基金资助:
    国家自然科学基金资助项目(61472169,61502215);国家重点研发计划项目(2016YFC0801406);辽宁省教育厅一般项目(L2015193);辽宁省博士科研启动基金项目(201501127)。

Dynamic Top-K interesting subgraph query on large-scale labeled graph

SONG Baoyan, JIA Chunjie, SHAN Xiaohuan, DING Linlin, DING Xingyan   

  1. College of Information, Liaoning University, Shenyang Liaoning 110036, China
  • Received:2017-08-28 Revised:2017-09-28 Online:2018-02-10 Published:2018-02-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472169, 61502215), the National Key Research and Development Program of China (2016YFC0801406), the General Science Research Project of Liaoning Province Education Department (L2015193), Doctoral Scientific Research Start Foundation of Liaoning Province (201501127).

摘要: 针对传统算法由于时间或空间复杂度过高而难以实现规模大且动态变化情况下标签图的Top-K子图查询问题,提出一种适用于大规模标签图的动态Top-K兴趣子图查询方法DISQtop-K。该方法建立了包括节点拓扑结构特性(NTF)索引和边特性(EF)索引的图拓扑结构特性(GTSF)索引,利用该索引可有效剪枝过滤不满足限制条件的无效节点及边;基于GTSF索引提出了多因素候选集过滤策略,通过对查询图候选集进一步剪枝以获得较少的候选集;考虑到图的动态变化可能对匹配结果产生影响,提出了Top-K兴趣子图匹配验证方法——DISQtop-K,将匹配验证过程分为初始匹配和动态修正两个阶段,以尽可能保证查询结果的实时、准确。大量实验结果表明,相比RAM、RWM算法,DISQtop-K方法的索引创建时间较短且占用空间较少,能有效处理大规模标签图中的动态Top-K兴趣子图查询。

关键词: 大规模标签图, 动态Top-K, 兴趣子图, 子图查询

Abstract: The traditional algorithms are difficult to implement the Top-K subgraph query on large-scale dynamic labeled graph due to high time or space complexity. For this reason, a dynamic Top-K interesting subgraph query method named DISQtop-K was proposed. In this algorithm, a Graph Topology Structure Feature (GTSF) index that include Node Topology Feature (NTF) index and Edge Feature (EF) index was established, which can effectively prune and filter the invalid nodes and edges. Then a multi-factor candidate set filtering strategy was put forward based on GTSF index, which can be used to further prune the query graph candidate sets. Considering that the dynamic changes in the graph may have an impact on the matching results, to ensure the real-time and accuracy of the query results, a new matching-verification method for Top-K interesting subgraph was also given, which has two stages of initial matching and dynamic correction. Experimental results show that compared with RAM and RWM, DISQtop-K method costs shorter time for index creation and occupies less space, which can effectively deal with dynamic Top-K interesting subgraph query on large-scale labeled graph.

Key words: large-scale labeled graph, dynamic Top-K, interesting subgraph, subegraph query

中图分类号: