Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2040-2048.DOI: 10.11772/j.issn.1001-9081.2022071130
Special Issue: 第39届CCF中国数据库学术会议(NDBC 2022)
• The 39th CCF National Database Conference (NDBC 2022) • Previous Articles Next Articles
Chunze CAO1, Delong MA1, Ye YUAN2()
Received:
2022-07-12
Revised:
2022-08-28
Accepted:
2022-09-05
Online:
2023-07-20
Published:
2023-07-10
Contact:
Ye YUAN
About author:
CAO Chunze, born in 1997, M. S. candidate. His research interests include geo-distributed computing, triangle counting.Supported by:
通讯作者:
袁野
作者简介:
曹春泽(1997—),男,安徽合肥人,硕士研究生,主要研究方向:跨域计算、三角计数;基金资助:
CLC Number:
Chunze CAO, Delong MA, Ye YUAN. GTC: geo-distributed triangle counting algorithm in graph streams[J]. Journal of Computer Applications, 2023, 43(7): 2040-2048.
曹春泽, 马德龙, 袁野. 跨域环境下图流三角计数算法GTC[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2040-2048.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022071130
符号 | 描述 |
---|---|
图流 | |
索引为 | |
顶点 | |
顶点 | |
顶点 | |
Worker数量 | |
图流中三角形数量的估计值 | |
Worker | |
水库容量 | |
顶点 | |
顶点 | |
Worker | |
Worker |
Tab. 1 Symbol table
符号 | 描述 |
---|---|
图流 | |
索引为 | |
顶点 | |
顶点 | |
顶点 | |
Worker数量 | |
图流中三角形数量的估计值 | |
Worker | |
水库容量 | |
顶点 | |
顶点 | |
Worker | |
Worker |
计算节点 | 上行带宽 | 下行带宽 |
---|---|---|
Master | 10 | 10 |
Worker | 5 | 5 |
Worker | 1 | 5 |
Tab. 2 Bandwidths of computing nodes
计算节点 | 上行带宽 | 下行带宽 |
---|---|---|
Master | 10 | 10 |
Worker | 5 | 5 |
Worker | 1 | 5 |
算法 | 时间复杂度 | 空间复杂度 | ||
---|---|---|---|---|
Master | Worker(全部) | Master | Worker(全部) | |
Tri-Fly | ||||
GTC |
Tab. 3 Time complexities and space complexities of different algorithms
算法 | 时间复杂度 | 空间复杂度 | ||
---|---|---|---|---|
Master | Worker(全部) | Master | Worker(全部) | |
Tri-Fly | ||||
GTC |
名称 | 顶点数 | 边数 | 三角形数 |
---|---|---|---|
ArXiv | 30 565 | 346 849 | 988 009 |
61 096 | 614 797 | 1 756 259 | |
86 978 | 297 456 | 1 180 387 | |
YouTube | 3 181 831 | 7 505 218 | 7 766 821 |
LiveJournal | 4 847 571 | 68 993 773 | 285 730 264 |
Tab. 4 Properties of experimental datasets
名称 | 顶点数 | 边数 | 三角形数 |
---|---|---|---|
ArXiv | 30 565 | 346 849 | 988 009 |
61 096 | 614 797 | 1 756 259 | |
86 978 | 297 456 | 1 180 387 | |
YouTube | 3 181 831 | 7 505 218 | 7 766 821 |
LiveJournal | 4 847 571 | 68 993 773 | 285 730 264 |
数据集 | Tri-Fly | SIMPLE | OPT | GTC |
---|---|---|---|---|
ArXiv | 0.573 | 0.542 | 0.310 | 0.302 |
0.551 | 0.476 | 0.273 | 0.269 | |
0.549 | 0.539 | 0.263 | 0.253 | |
YouTube | 0.578 | 0.495 | 0.255 | 0.251 |
LiveJournal | 0.612 | 0.513 | 0.254 | 0.281 |
Tab. 5 Error rates of different algorithms
数据集 | Tri-Fly | SIMPLE | OPT | GTC |
---|---|---|---|---|
ArXiv | 0.573 | 0.542 | 0.310 | 0.302 |
0.551 | 0.476 | 0.273 | 0.269 | |
0.549 | 0.539 | 0.263 | 0.253 | |
YouTube | 0.578 | 0.495 | 0.255 | 0.251 |
LiveJournal | 0.612 | 0.513 | 0.254 | 0.281 |
数据集 | Tri-Fly | SIMPLE | OPT | GTC |
---|---|---|---|---|
ArXiv | 3.10 | 2.63 | 2.23 | 2.36 |
17.38 | 14.23 | 10.27 | 11.54 | |
16.58 | 13.03 | 9.87 | 10.34 | |
YouTube | 223.81 | 193.77 | 137.43 | 145.87 |
LiveJournal | 1 420.82 | 1 232.47 | 819.61 | 854.46 |
Tab. 6 Running time of different algorithms
数据集 | Tri-Fly | SIMPLE | OPT | GTC |
---|---|---|---|---|
ArXiv | 3.10 | 2.63 | 2.23 | 2.36 |
17.38 | 14.23 | 10.27 | 11.54 | |
16.58 | 13.03 | 9.87 | 10.34 | |
YouTube | 223.81 | 193.77 | 137.43 | 145.87 |
LiveJournal | 1 420.82 | 1 232.47 | 819.61 | 854.46 |
1 | WASSERMAN S, FAUST K. Social Network Analysis: Methods and Applications[M]. Cambridge: Cambridge University Press, 1994: 108-109. 10.1017/s0048840200023959 |
2 | BECCHETTI L, BOLDI P, CASTILLO C, et al. Efficient algorithms for large-scale local triangle counting[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3): No.13. 10.1145/1839490.1839494 |
3 | 金宏桥,董一鸿,陈华辉,等.分布式环境下基于马尔科夫链的图流三角近似计算[J].电子学报, 2018, 46(9): 2139-2148. 10.3969/j.issn.0372-2112.2018.09.014 |
JIN H Q, DONG Y H, CHEN H H, et al. DATC-MC: Distributed approximate triangle counting based on Markov chain in graph streaming[J]. Acta Electronica Sinica, 2018, 46(9): 2139-2148. 10.3969/j.issn.0372-2112.2018.09.014 | |
4 | KUTZKOV K, PAGH R. On the streaming complexity of computing local clustering coefficients [C]// Proceedings of the 6th ACM International Conference on Web Search and Data Mining. New York: ACM, 2013: 677-686. 10.1145/2433396.2433480 |
5 | SHIN K, LEE E, OH J, et al. CoCoS: fast and accurate distributed triangle counting in graph streams[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(3): No.38. 10.1145/3441487 |
6 | SHIN K, HAMMOUD M, LEE E, et al. Tri-Fly: distributed estimation of global and local triangle counts in graph streams [C]// Proceedings of the 2018 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 10939. Cham: Springer, 2018: 651-663. |
7 | PU Q F, ANANTHANARAYANAN G, BODIK P, et al. Low latency geo-distributed data analytics[J]. ACM SIGCOMM Computer Communication Review, 2015, 45(4): 421-434. 10.1145/2829988.2787505 |
8 | NAWAB F, AGRAWAL D, ABBADI A EL. The challenges of global-scale data management [C]// Proceedings of the 2016 International Conference on Management of Data. New York: ACM, 2016: 2223-2227. 10.1145/2882903.2912571 |
9 | ZHOU A C, SHEN B K, XIAO Y, et al. Cost-aware partitioning for efficient large graph processing in geo-distributed datacenters[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(7): 1707-1723. 10.1109/tpds.2019.2955494 |
10 | SHIN K. WRS: waiting room sampling for accurate triangle counting in real graph streams [C]// Proceedings of the 2017 IEEE International Conference on Data Mining. Piscataway: IEEE, 2017: 1087-1092. 10.1109/icdm.2017.143 |
11 | VITTER J S. Random sampling with a reservoir[J]. ACM Transactions on Mathematical Software, 1985, 11(1): 37-57. 10.1145/3147.3165 |
12 | LIM Y, KANG U. MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams [C]// Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 685-694. 10.1145/2783258.2783285 |
13 | TSOURAKAKIS C E, KANG U, MILLER G L, et al. DOULION: counting triangles in massive graphs with a coin [C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 837-846. 10.1145/1557019.1557111 |
14 | DE STEFANI L, EPASTO A, RIONDATO M, et al. TRIÈST: counting local and global triangles in fully dynamic streams with fixed memory size[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(4): No.43. 10.1145/3059194 |
15 | GEMULLA R, LEHNER W, HAAS P J. Maintaining bounded-size sample synopses of evolving datasets[J]. The VLDB Journal, 2008, 17(2): 173-201. 10.1007/s00778-007-0065-y |
16 | WANG P H, JIA P, QI Y Y, et al. REPT: a streaming algorithm of approximating global and local triangle counts in parallel [C]// Proceedings of the IEEE 35th International Conference on Data Engineering. Piscataway: IEEE, 2019: 758-769. 10.1109/icde.2019.00073 |
17 | ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: cluster computing with working sets [C]// Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 1-7. |
18 | GONZALEZ J E, LOW Y C, GU H J, et al. PowerGraph: distributed graph-parallel computation on natural graphs [C]// Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2012: 17-30. |
19 | ABOU-RJEILI A, KARYPIS G. Multilevel algorithms for partitioning power-law graphs [C]// Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2006: 1-10. 10.1109/ipdps.2006.1639360 |
20 | AHMED N K, DUFFIELD N, WILLKE T L, et al. On sampling from massive graph streams[J]. Proceedings of the VLDB Endowment, 2017, 10(11): 1430-1441. 10.14778/3137628.3137651 |
21 | GEHRKE J, GINSPARG P, KLEINBERG J. Overview of the 2003 KDD Cup[J]. ACM SIGKDD Explorations Newsletter, 2003, 5(2): 149-151. 10.1145/980972.980992 |
22 | VISWANATH B, MISLOVE A, CHA M, et al. On the evolution of user interaction in Facebook [C]// Proceedings of the 2nd ACM Workshop on Online Social Networks. New York: ACM, 2009: 37-42. 10.1145/1592665.1592675 |
23 | KLIMT B, YANG Y M. Introducing the Enron corpus[C/OL]// Proceedings of 1st Conference on Email and Anti-Spam [2022-05-23]. . 10.1007/978-3-540-30115-8_22 |
24 | MISLOVE A E. Online social networks: measurement, analysis, and applications to distributed information systems[R]. Houston, TX: Rice University, 2009: 66-66. |
25 | BACKSTROM L, HUTTENLOCHER D, KLEINBERG J, et al. Group formation in large social networks: membership, growth, and evolution [C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 44-54. 10.1145/1150402.1150412 |
[1] | Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688. |
[2] | Qiangkui LENG, Xuezi SUN, Xiangfu MENG. Oversampling method for imbalanced data based on sample potential and noise evolution [J]. Journal of Computer Applications, 2024, 44(8): 2466-2475. |
[3] | Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436. |
[4] | Wenping ZHENG, Huilin GE, Meilin LIU, Gui YANG. Node classification algorithm fusing 2-connected motif-structure information [J]. Journal of Computer Applications, 2024, 44(5): 1464-1470. |
[5] | Kui ZHAO, Huiqi QIU, Xu LI, Zhifei XU. Real-time pulmonary nodule detection algorithm combining attention and multipath fusion [J]. Journal of Computer Applications, 2024, 44(3): 945-952. |
[6] | Jinbo LI, Ping ZHANG, Ji ZHANG, Muhua LIU. Identity-based ring signature scheme on number theory research unit lattice [J]. Journal of Computer Applications, 2023, 43(9): 2798-2805. |
[7] | Zhongyu LI, Haodong SUN, Jiao LI. Lightweight gesture recognition algorithm for basketball referee [J]. Journal of Computer Applications, 2023, 43(7): 2173-2181. |
[8] | Yi JIANG, Shuping WU, Kun HU, Linbo LONG. Imbalanced data classification method based on Lasso and constructive covering algorithm [J]. Journal of Computer Applications, 2023, 43(4): 1086-1093. |
[9] | Jian GAO, Zhi LI, Bin FAN, Chuanxian JIANG. Efficient robust zero-watermarking algorithm for 3D medical images based on ray-casting sampling and quaternion orthogonal moment [J]. Journal of Computer Applications, 2023, 43(4): 1191-1197. |
[10] | Xincheng ZHONG, Chang LIU, Xiumei ZHAO. High utility itemset mining algorithm based on Markov optimization [J]. Journal of Computer Applications, 2023, 43(12): 3764-3771. |
[11] | Yonggui WANG, Qiwen SHI. Social recommendation by enhanced GNN with heterogeneous relationship [J]. Journal of Computer Applications, 2023, 43(11): 3464-3471. |
[12] | Zixing YU, Shaojun QU, Xin HE, Zhuo WANG. High-low dimensional feature guided real-time semantic segmentation network [J]. Journal of Computer Applications, 2023, 43(10): 3077-3085. |
[13] | Xiaoyan LU, Yang XU, Wenhao YUAN. Multiscale dense fusion network for lung lesion image segmentation [J]. Journal of Computer Applications, 2023, 43(10): 3282-3289. |
[14] | NONG Qiang, ZHANG Bangbang, OUYANG Yuhao. Lattice-based hierarchical certificateless proxy signature scheme [J]. Journal of Computer Applications, 2023, 43(1): 154-159. |
[15] | Shuxin YANG, Jingfeng XU. Positive influence maximization based on reverse influence sampling [J]. Journal of Computer Applications, 2022, 42(8): 2609-2616. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||