《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2190-2199.DOI: 10.11772/j.issn.1001-9081.2022060941
• 数据科学与技术 • 上一篇
收稿日期:
2022-06-28
修回日期:
2022-09-14
接受日期:
2022-09-22
发布日期:
2022-10-11
出版日期:
2023-07-10
通讯作者:
周军锋
作者简介:
杜明(1975—),男,黑龙江鸡西人,副教授,博士,主要研究方向:自然语言处理、信息查询、数据分析;基金资助:
Ming DU, Wanli GU, Junfeng ZHOU(), Zhijun WANG
Received:
2022-06-28
Revised:
2022-09-14
Accepted:
2022-09-22
Online:
2022-10-11
Published:
2023-07-10
Contact:
Junfeng ZHOU
About author:
DU Ming, born in 1975, Ph. D., associate professor. His research interests include natural language processing, information query, data analysis.Supported by:
摘要:
社区搜索的目标是从数据图中得到包含查询顶点的紧密子图,在社会学、生物学等领域有着广泛应用。针对现有基于子图连通性的社区模型的基础连通结构都是完全连通图,无法满足实际应用中用户对社区结构多样性的需求的问题,提出一种基于motif连通性的社区搜索方法,其中包括基于motif连通性的社区(MCC)模型以及两个相应的社区搜索算法——MPCS (Motif-Processed Community Search)算法和基于MP-index的社区搜索算法。MCC模型可以协助用户自由指定社区的基础连通结构,MPCS算法可以用来解决MCC的搜索问题。此外,提出两个分别针对motif实例搜索过程及所属社区判断过程的剪枝优化技术。最后,设计了MP-index以避免社区搜索过程中的冗余遍历操作。在多个真实数据集上进行实验的结果表明:剪枝优化可以使MPCS算法的耗时减少60%~85%,而基于MP-index的社区搜索算法相较于加入剪枝优化的MPCS算法,效率提升普遍达到了2~3个数量级。可见,所提方法在商品推荐和社交网络等问题上有着实际应用价值。
中图分类号:
杜明, 顾万里, 周军锋, 王志军. 基于motif连通性的社区搜索方法[J]. 计算机应用, 2023, 43(7): 2190-2199.
Ming DU, Wanli GU, Junfeng ZHOU, Zhijun WANG. Community search method based on motif connectivity[J]. Journal of Computer Applications, 2023, 43(7): 2190-2199.
图5 以图1中的无向无权图G为例、motif为图2(d)、最高阈值为4的MP-index实例
Fig. 5 Example of MP-index for undirected graph Gin figure 1 with motif of figure 2(d) and maximum threshold of 4
数据集 | ||
---|---|---|
Ecoo | 12 620 | 13 350 |
Fb-sports | 13 617 | 86 890 |
Yago | 6 642 | 42 392 |
10go-uniprot | 469 526 | 3 476 397 |
Cit-Patents | 3 774 768 | 16 518 947 |
Arxiv | 6 000 | 133 414 |
Soc-academia | 201 211 | 2 491 576 |
Citeseer | 10 720 | 44 258 |
Soc-flixster | 3 365 623 | 7 989 191 |
表1 数据集统计信息
Tab. 1 Statistics for datasets
数据集 | ||
---|---|---|
Ecoo | 12 620 | 13 350 |
Fb-sports | 13 617 | 86 890 |
Yago | 6 642 | 42 392 |
10go-uniprot | 469 526 | 3 476 397 |
Cit-Patents | 3 774 768 | 16 518 947 |
Arxiv | 6 000 | 133 414 |
Soc-academia | 201 211 | 2 491 576 |
Citeseer | 10 720 | 44 258 |
Soc-flixster | 3 365 623 | 7 989 191 |
数据集 | MPCS* | MP-index |
---|---|---|
Ecoo | 3.58 | 0.003 8 |
Yago | 2.03 | 0.002 3 |
10go-uniprot | 100.25 | 0.021 0 |
Cit-Patents | 152.67 | 0.039 0 |
Arxiv | 61.80 | 0.033 0 |
Soc-academia | 181.59 | 0.071 0 |
Citeseer | 31.05 | 0.006 7 |
Soc-flixster | 13.04 | 0.001 2 |
表2 MPCS*算法和基于MP-index的社区搜索算法的运行时间对比 ( s)
Tab. 2 Running time comparison of MPCS* algorithm and MP-index based community search algorithm
数据集 | MPCS* | MP-index |
---|---|---|
Ecoo | 3.58 | 0.003 8 |
Yago | 2.03 | 0.002 3 |
10go-uniprot | 100.25 | 0.021 0 |
Cit-Patents | 152.67 | 0.039 0 |
Arxiv | 61.80 | 0.033 0 |
Soc-academia | 181.59 | 0.071 0 |
Citeseer | 31.05 | 0.006 7 |
Soc-flixster | 13.04 | 0.001 2 |
图8 不同阈值下的MPCS*算法和基于MP-index的社区搜索算法的运行时间对比
Fig. 8 Running time comparison of MPCS* algorithm and MP-index based community search algorithm under different thresholds
1 | CHEN H P, CONTE A, GROSSI R, et al. On breaking truss-based communities [C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2021: 117-126. 10.1145/3447548.3467365 |
2 | 竺俊超, 王朝坤. 复杂条件下的社区搜索方法[J]. 软件学报, 2019, 30 (3): 552-572. |
ZHU J C, WANG C K. Approaches to community search under complex conditions[J]. Journal of Software, 2019, 30 (3): 552-572. | |
3 | HUANG X, CHENG H, QIN L, et al. Querying k-truss community in large and dynamic graphs [C]// Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2014: 1311-1322. 10.1145/2588555.2610495 |
4 | WANG Y Q, GU Y, SHUN J L. Theoretically-efficient and practical DBSCAN [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020: 2555-2571. 10.1145/3318464.3380582 |
5 | PACACI A, BONIFATI A, ÖZSU M T. Regular path query evaluation on streaming graphs [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020: 1415-1430. 10.1145/3318464.3389733 |
6 | YUAN L, QIN L, ZHANG W J, et al. Index-based densest clique percolation community search in networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30 (5): 922-935. 10.1109/tkde.2017.2783933 |
7 | LIN D D, WONG R C W, XIE M, et al. Index-free approach with theoretical guarantee for efficient random walk with restart query [C]// Proceedings of the 2020 IEEE International Conference on Data Engineering. Piscataway: IEEE, 2020: 913-924. 10.1109/icde48307.2020.00084 |
8 | FRITZ M, BEHRINGER M, SCHWARZ H. LOG-Means: efficiently estimating the number of clusters in large datasets[J]. Proceedings of the VLDB Endowment, 2020, 13 (12): 2118-2131. 10.14778/3407790.3407813 |
9 | BANERJEE S, PAL B. On the enumeration of maximal (Δ,γ)-cliques of a temporal network [C]// Proceedings of the 2019 ACM India Joint International Conference on Data Science and Management of Data. New York: ACM, 2019: 112-120. 10.1145/3297001.3297015 |
10 | CHANG L J. Efficient maximum clique computation over large sparse graphs [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 529-538. 10.1145/3292500.3330986 |
11 | BRON C, KERBOSCH J. Algorithm 457: finding all cliques of an undirected graph[J]. Communications of the ACM, 1973, 16 (9): 575-577. 10.1145/362342.362367 |
12 | TOMITA E, TANAKA A, TAKAHASHI H. The worst-case time complexity for generating all maximal cliques and computational experiments[J]. Theoretical Computer Science, 2006, 363 (1): 28-42. 10.1016/j.tcs.2006.06.015 |
13 | FANG Y X, WANG Z, CHENG R, et al. On spatial-aware community search[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31 (4): 783-798. 10.1109/tkde.2018.2845414 |
14 | WEI V J, WONG R C W, LONG C. Architecture-intact oracle for fastest path and time queries on dynamic spatial networks [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020: 1841-1856. 10.1145/3318464.3389718 |
15 | SOZIO M, GIONIS A. The community-search problem and how to plan a successful cocktail party [C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 939-948. 10.1145/1835804.1835923 |
16 | REZVANI M, REZVANI M. Truss decomposition using triangle graphs[J]. Soft Computing, 2022, 26 (1): 55-68. 10.1007/s00500-021-06468-9 |
17 | LI X D, CHENG R, CHANG K C C, et al. On analyzing graphs with motif-paths[J]. Proceedings of the VLDB Endowment, 2021, 14 (6): 1111-1123. 10.14778/3447689.3447714 |
18 | MA C H, CHENG R, LAKSHMANAN L V S, et al. LINC: a motif counting algorithm for uncertain graphs[J]. Proceedings of the VLDB Endowment, 2019, 13 (2): 155-168. 10.14778/3364324.3364330 |
19 | MICALE G, GIUGNO R, FERRO A, et al. Fast analytical methods for finding significant labeled graph motifs[J]. Data Mining and Knowledge Discovery, 2018, 32 (2): 504-531. 10.1007/s10618-017-0544-8 |
20 | BENSON A R, GLEICH D F, LESKOVEC J. Tensor spectral clustering for partitioning higher-order network structures [C]// Proceedings of the 2015 SIAM International Conference on Data Mining. Philadelphia, PA: SIAM, 2015: 118-126. 10.1137/1.9781611974010.14 |
21 | BENSON A R, GLEICH D F, LESKOVEC J. Higher-order organization of complex networks[J]. Science of Computer and Data, 2016, 353 (6295): 163-166. 10.1126/science.aad9029 |
22 | FANG Y X, CHENG R, LI X D, et al. Effective community search over large spatial graphs[J]. Proceedings of the VLDB Endowment, 2017, 10 (6): 709-720. 10.14778/3055330.3055337 |
23 | CAO J, LI B T, GUI X Q. Research on the influence of network motif on link prediction [C]// Proceedings of the 2016 6th International Conference on Information Technology for Manufacturing Systems. Lancaster: DEStech Transactions on Computer Science and Engineering, 2016. 10.12783/dtcse/itms2016/9455 |
24 | DAVE V S, AHMED N K, HASAN M AL. E-CLoG: counting edge-centric local graphlets [C]// Proceedings of the 2017 IEEE International Conference on Big Data. Piscataway: IEEE, 2017: 586-595. 10.1109/bigdata.2017.8257974 |
25 | LI P Z, HUANG L, WANG C D, et al. EdMot: an Edge enhancement approach for motif-aware community detection [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2019: 479-487. 10.1145/3292500.3330882 |
26 | ABUODA G, DE FRANCISCI MORALES G, ABOULNAGA A. Link prediction via higher-order motif features [C]// Proceedings of the 2019 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS 11906. Cham: Springer, 2020: 412-429. |
[1] | 杜晓昕 周薇 王浩 郝田茹 王振飞 金梅 张剑飞. 智能算法的亚群优化策略综述[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[2] | 单芝慧, 韩萌, 韩强. 增量数据上的闭合定量高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2049-2056. |
[3] | 李飞, 乐强, 潘紫微, 孙怡宁, 余晓流. 基于自动快速密度峰值聚类的粒子群动态优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 154-162. |
[4] | 白玮, 王成, 王彩玲, 詹熙, 张磊. 基于蚁群数量动态调整的改进蚁群优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 163-168. |
[5] | 李彦苹, 孙广宇, 杨文轩, 李传宪, 赵文亮, 牛化昶, 于洋. 基于自适应权重调整与差分进化策略的并行式混合蛙跳算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 169-176. |
[6] | 王一卓, 王磊, 徐方洁, 张亚光. 基于程序重写的浮点程序精度缺陷修复方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 177-181. |
[7] | 赵译文, 刘云鹏. 基于子空间流形的图像集识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 207-211. |
[8] | 王建华, 李乐, 孟学雷. 基于收敛粒子群算法的重载铁路列车运行调整方法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 307-313. |
[9] | 李牧, 李倩, 柯熙政, 陶启婷. 基于两阶段运动伪影消除的心率检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 333-339. |
[10] | 贾鹤鸣, 力尚龙, 陈丽珍, 刘庆鑫, 吴迪, 郑荣. 基于混沌宿主切换机制的䲟鱼优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1759-1767. |
[11] | 袁泉, 唐成亮, 徐雲鹏. 基于长度约束的蝙蝠高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1473-1480. |
[12] | 孙亚男, 吴杰宏, 石峻岭, 高利军. 改进自组织映射的多无人机协同任务分配方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1551-1556. |
[13] | 程顺航, 李志华, 魏涛. 融合自举与语义角色标注的威胁情报实体关系抽取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1445-1453. |
[14] | 马磊 罗川 李天瑞 陈红梅. 基于模糊粗糙集的无监督动态特征选择算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
[15] | 张瑾 徐文 周宇乔 刘凯. 不规则物体点云切片中的多轮廓分割算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||