《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2190-2199.DOI: 10.11772/j.issn.1001-9081.2022060941

• 数据科学与技术 • 上一篇    

基于motif连通性的社区搜索方法

杜明, 顾万里, 周军锋(), 王志军   

  1. 东华大学 计算机科学与技术学院,上海 201620
  • 收稿日期:2022-06-28 修回日期:2022-09-14 接受日期:2022-09-22 发布日期:2022-10-11 出版日期:2023-07-10
  • 通讯作者: 周军锋
  • 作者简介:杜明(1975—),男,黑龙江鸡西人,副教授,博士,主要研究方向:自然语言处理、信息查询、数据分析;
    顾万里(1998—),男,上海人,硕士,主要研究方向:社区查询处理;
    周军锋(1977—),男,陕西西安人,教授,博士,主要研究方向:信息检索、图形数据的查询处理优化;
    王志军(1974—),男,黑龙江五常人,副教授,博士,主要研究方向:物联网信息服务、信息检索。
  • 基金资助:
    上海市自然科学基金资助项目(20ZR1402700)

Community search method based on motif connectivity

Ming DU, Wanli GU, Junfeng ZHOU(), Zhijun WANG   

  1. College of Computer Science and Technology,Donghua University,Shanghai 201620,China
  • Received:2022-06-28 Revised:2022-09-14 Accepted:2022-09-22 Online:2022-10-11 Published:2023-07-10
  • Contact: Junfeng ZHOU
  • About author:DU Ming, born in 1975, Ph. D., associate professor. His research interests include natural language processing, information query, data analysis.
    GU Wanli, born in 1998, M. S. His research interests include community query processing.
    ZHOU Junfeng, born in 1977, Ph. D., professor. His research interests include information retrieval, query processing, optimization of graphical data.
    WANG Zhijun, born in 1974, Ph. D., associate professor. His research interests include internet of things information service, information retrieval.
  • Supported by:
    Natural Science Foundation of Shanghai(20ZR1402700)

摘要:

社区搜索的目标是从数据图中得到包含查询顶点的紧密子图,在社会学、生物学等领域有着广泛应用。针对现有基于子图连通性的社区模型的基础连通结构都是完全连通图,无法满足实际应用中用户对社区结构多样性的需求的问题,提出一种基于motif连通性的社区搜索方法,其中包括基于motif连通性的社区(MCC)模型以及两个相应的社区搜索算法——MPCS (Motif-Processed Community Search)算法和基于MP-index的社区搜索算法。MCC模型可以协助用户自由指定社区的基础连通结构,MPCS算法可以用来解决MCC的搜索问题。此外,提出两个分别针对motif实例搜索过程及所属社区判断过程的剪枝优化技术。最后,设计了MP-index以避免社区搜索过程中的冗余遍历操作。在多个真实数据集上进行实验的结果表明:剪枝优化可以使MPCS算法的耗时减少60%~85%,而基于MP-index的社区搜索算法相较于加入剪枝优化的MPCS算法,效率提升普遍达到了2~3个数量级。可见,所提方法在商品推荐和社交网络等问题上有着实际应用价值。

关键词: 社区搜索, motif连通性, 子图连通性社区, 剪枝优化, 社区结构多样性

Abstract:

The goal of community search is to obtain compact subgraphs containing query vertices from data graphs, and community search is widely used in sociology, biology, and other fields. In view of the fact that the basic connectivity structures of the existing community models based on subgraph connectivity are all completely connected graphs, which cannot meet the needs of users for the diversity of community structures in practical applications, a community search method based on motif connectivity was proposed, including Motif-Connective Community (MCC) model based on motif connectivity and two corresponding community search algorithms — MPCS (Motif-Processed Community Search) algorithm and MP-index based community search algorithm. MCC model was able to help users freely specify the basic connectivity structure of community, and MPCS algorithm was able to be used to solve the search problem of MCC. Furthermore, two pruning optimization techniques were proposed for the motif instance search process and the belonged community judgment process. Finally, the MP-index forest was designed to avoid redundant traversal operations in the process of community search. Experimental results on multiple real datasets show that the pruning optimization can reduce the running time of MPCS algorithm by 60% to 85%, and the efficiency of the community search algorithm based on MP-index forest is improved by two to three orders of magnitude compared with the efficiency of MPCS algorithm added with pruning optimization. It can be seen that the proposed method has practical application values in commodity recommendation, social network and other issues.

Key words: community search, motif connectivity, subgraph connectivity community, pruning optimization, diversity of community structure

中图分类号: