Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 510-517.DOI: 10.11772/j.issn.1001-9081.2019091666
• CCF Bigdata 2019 • Previous Articles Next Articles
Danfeng ZHAO1, Junchen LIN1, Wei SONG1, Jian WANG1, Dongmei HUANG2()
Received:
2019-08-30
Revised:
2019-09-29
Accepted:
2019-10-09
Online:
2019-10-14
Published:
2020-02-10
Contact:
Dongmei HUANG
About author:
ZHAO Danfeng, born in 1982. Ph. D., lecturer. Her research interests include big data storage, business process management.Supported by:
通讯作者:
黄冬梅
作者简介:
赵丹枫(1982—),女,黑龙江五常人,讲师,博士,CCF会员,主要研究方向:大数据存储、业务流程管理基金资助:
CLC Number:
Danfeng ZHAO, Junchen LIN, Wei SONG, Jian WANG, Dongmei HUANG. Strategy with low redundant computation for reachability query preserving graph compression[J]. Journal of Computer Applications, 2020, 40(2): 510-517.
赵丹枫, 林俊辰, 宋巍, 王建, 黄冬梅. 低冗余计算的可达性查询保持图压缩策略[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 510-517.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019091666
数据集 | 顶点数(|V|) | 边数(|E|) | 数据来源文献 | 在文中的缩写 |
---|---|---|---|---|
Youtube_0 | 3 356 | 3 615 | [ | Y0 |
Web-California | 6 175 | 16 150 | [ | WebC |
Wiki-Vote | 7 115 | 103 689 | [ | WikiV |
Youtube_1 | 26 845 | 60 603 | [ | Y1 |
Cit-HepTh | 27 769 | 352 768 | [ | CHT |
Cit-HepPh | 34 546 | 421 534 | [ | CHP |
Tab. 1 Experimental datasets
数据集 | 顶点数(|V|) | 边数(|E|) | 数据来源文献 | 在文中的缩写 |
---|---|---|---|---|
Youtube_0 | 3 356 | 3 615 | [ | Y0 |
Web-California | 6 175 | 16 150 | [ | WebC |
Wiki-Vote | 7 115 | 103 689 | [ | WikiV |
Youtube_1 | 26 845 | 60 603 | [ | Y1 |
Cit-HepTh | 27 769 | 352 768 | [ | CHT |
Cit-HepPh | 34 546 | 421 534 | [ | CHP |
软、硬件环境 | 配置 |
---|---|
操作系统 | Ubuntu Server 18.04 |
CPU | Intel Xeon Gold 6130 CPU @ 2.10 GHz |
内核数 | 2 |
内存 | 256 GB |
硬盘 | 1 TB |
Spark | 2.4.3 |
Hadoop | 2.7 |
Python | 3.7.0 |
Worker 个数 | 4 |
Tab. 2 Configuration of experimental environment
软、硬件环境 | 配置 |
---|---|
操作系统 | Ubuntu Server 18.04 |
CPU | Intel Xeon Gold 6130 CPU @ 2.10 GHz |
内核数 | 2 |
内存 | 256 GB |
硬盘 | 1 TB |
Spark | 2.4.3 |
Hadoop | 2.7 |
Python | 3.7.0 |
Worker 个数 | 4 |
数据集 | 顶点数(|V|) | 边数(|E|) | 压缩图顶点数(|Vr|) | 压缩图边数(|Er|) | 压缩率/% |
---|---|---|---|---|---|
Y0 | 3 356 | 3 615 | 350 | 248 | 8.58 |
WebC | 6 175 | 16 150 | 3 802 | 11 826 | 70.00 |
WikiV | 7 115 | 103 689 | 1 016 | 2 666 | 3.32 |
Y1 | 26 845 | 60 603 | 3 717 | 7 331 | 12.63 |
CHT | 27 769 | 352 768 | 18 821 | 127 280 | 38.39 |
CHP | 34 546 | 421 534 | 18 683 | 109 583 | 28.12 |
Tab. 3 Compression ratio of the proposed algorithm
数据集 | 顶点数(|V|) | 边数(|E|) | 压缩图顶点数(|Vr|) | 压缩图边数(|Er|) | 压缩率/% |
---|---|---|---|---|---|
Y0 | 3 356 | 3 615 | 350 | 248 | 8.58 |
WebC | 6 175 | 16 150 | 3 802 | 11 826 | 70.00 |
WikiV | 7 115 | 103 689 | 1 016 | 2 666 | 3.32 |
Y1 | 26 845 | 60 603 | 3 717 | 7 331 | 12.63 |
CHT | 27 769 | 352 768 | 18 821 | 127 280 | 38.39 |
CHP | 34 546 | 421 534 | 18 683 | 109 583 | 28.12 |
数据集 | 运行时间/s | 加速比/% | |||
---|---|---|---|---|---|
QPGC | TSB | AGGB | TSB | AGGB | |
Y0 | 1.34 | 0.06 | 0.05 | 95.28 | 96.41 |
WebC | 4.67 | 0.39 | 0.54 | 91.60 | 88.37 |
WikiV | 24.92 | 1.95 | 2.25 | 92.17 | 90.98 |
Y1 | 74.53 | 1.20 | 4.03 | 98.39 | 94.60 |
CHT | 309.85 | 15.40 | 52.23 | 95.03 | 83.14 |
CHP | 705.15 | 50.41 | 95.33 | 92.85 | 86.48 |
Tab. 4 Performance comparison of TSB, AGGB and QPGC
数据集 | 运行时间/s | 加速比/% | |||
---|---|---|---|---|---|
QPGC | TSB | AGGB | TSB | AGGB | |
Y0 | 1.34 | 0.06 | 0.05 | 95.28 | 96.41 |
WebC | 4.67 | 0.39 | 0.54 | 91.60 | 88.37 |
WikiV | 24.92 | 1.95 | 2.25 | 92.17 | 90.98 |
Y1 | 74.53 | 1.20 | 4.03 | 98.39 | 94.60 |
CHT | 309.85 | 15.40 | 52.23 | 95.03 | 83.14 |
CHP | 705.15 | 50.41 | 95.33 | 92.85 | 86.48 |
数据集 | 中间变量占用的存储空间大小 | ||
---|---|---|---|
QPGC/KB | TSB/KB | AGGB/B | |
Y0 | 10.27 | 21.73 | 6 |
WebC | 18.01 | 36.53 | 6 |
WikiV | 19.85 | 38.48 | 6 |
Y1 | 79.98 | 165.83 | 6 |
CHT | 76.00 | 144.61 | 6 |
CHP | 89.24 | 158.67 | 6 |
Tab. 5 Comparison of memory usage of intermediate variables of three algorithms
数据集 | 中间变量占用的存储空间大小 | ||
---|---|---|---|
QPGC/KB | TSB/KB | AGGB/B | |
Y0 | 10.27 | 21.73 | 6 |
WebC | 18.01 | 36.53 | 6 |
WikiV | 19.85 | 38.48 | 6 |
Y1 | 79.98 | 165.83 | 6 |
CHT | 76.00 | 144.61 | 6 |
CHP | 89.24 | 158.67 | 6 |
S | 不同数据集的存储空间占用大小/KB | |||||
---|---|---|---|---|---|---|
Y0 | WebC | WikiV | Y1 | CHT | CHP | |
2 | 2 749.71 | 9 310.76 | 12 360.94 | 175 947.49 | 188 267.87 | 291 363.81 |
50 | 111.44 | 373.89 | 496.81 | 7 038.96 | 7 538.86 | 11 655.92 |
100 | 55.73 | 186.95 | 250.15 | 3 526.04 | 3 769.44 | 5 836.40 |
200 | 27.87 | 93.48 | 125.08 | 1 769.58 | 1 884.73 | 2 918.21 |
500 | 11.48 | 39.21 | 52.13 | 707.84 | 759.32 | 1 180.79 |
1 000 | 6.57 | 21.12 | 27.81 | 353.93 | 379.67 | 590.40 |
Tab. 6 Memory usage under different segment size S
S | 不同数据集的存储空间占用大小/KB | |||||
---|---|---|---|---|---|---|
Y0 | WebC | WikiV | Y1 | CHT | CHP | |
2 | 2 749.71 | 9 310.76 | 12 360.94 | 175 947.49 | 188 267.87 | 291 363.81 |
50 | 111.44 | 373.89 | 496.81 | 7 038.96 | 7 538.86 | 11 655.92 |
100 | 55.73 | 186.95 | 250.15 | 3 526.04 | 3 769.44 | 5 836.40 |
200 | 27.87 | 93.48 | 125.08 | 1 769.58 | 1 884.73 | 2 918.21 |
500 | 11.48 | 39.21 | 52.13 | 707.84 | 759.32 | 1 180.79 |
1 000 | 6.57 | 21.12 | 27.81 | 353.93 | 379.67 | 590.40 |
数据集 | 压缩过程两阶段总体用时/s | ||
---|---|---|---|
QPGC | TSB+S100 | AGGB+S100 | |
Y0 | 9.48 | 3.87 | 3.86 |
WebC | 500.97 | 36.17 | 36.32 |
WikiV | 76.83 | 9.87 | 10.16 |
Y1 | 10 465.31 | 2 954.20 | 2 957.03 |
CHT | 41 377.63 | 1 469.68 | 1 506.51 |
CHP | 50 557.20 | 1 824.11 | 1 869.03 |
Tab. 7 Total time cost comparison
数据集 | 压缩过程两阶段总体用时/s | ||
---|---|---|---|
QPGC | TSB+S100 | AGGB+S100 | |
Y0 | 9.48 | 3.87 | 3.86 |
WebC | 500.97 | 36.17 | 36.32 |
WikiV | 76.83 | 9.87 | 10.16 |
Y1 | 10 465.31 | 2 954.20 | 2 957.03 |
CHT | 41 377.63 | 1 469.68 | 1 506.51 |
CHP | 50 557.20 | 1 824.11 | 1 869.03 |
1 | 柴瑜晗.基于语义图的中文领域概念及关系抽取方法研究与实现[D]. 石家庄: 河北科技大学, 2019: 17. |
CHAI Y H. Research and implementation on the method of Chinese domain concept and relation extraction based on semantic graph[D]. Shijiazhuang: Hebei University of Science and Technology, 2019: 17. | |
2 | LI F, ZHANG Q, GU T, et al. Optimal representation for Web and social network graphs based on K-2-tree[J]. IEEE Access, 2019, 7: 52945-52954. 10.1109/access.2019.2912172 |
3 | 马超,孙群,陈换新,等. 加权网页排序算法在道路网自动选取中的应用[J]. 武汉大学学报(信息科学版), 2018, 43(8): 1159-1165. |
MA C, SUN Q, CHEN H X, et al. Application of weighted PageRank algorithm in road network auto-selection[J]. Geomatics and Information Science of Wuhan University, 2018, 43(8): 1159-1165. | |
4 | 王婧璇,闫强. 移动应用商店中的平台推荐网络分析[J]. 北京邮电大学学报(社会科学版), 2014, 16(1): 60-64, 72. 10.1007/978-3-319-14254-8_6 |
WANG J X, YAN Q. Platform recommendation network of mobile application store[J]. Journal of Beijing University of Posts and Telecommunications (Social Sciences Edition), 2014, 16(1): 60-64, 72. 10.1007/978-3-319-14254-8_6 | |
5 | 谢婧璘. 面向Web社会网络的分析工具[D]. 上海: 复旦大学, 2011: 6-8. |
XIE J L. Analysis tool for Web social network[D]. Shanghai: Fudan University, 2011: 6-8. | |
6 | 许睿,李琳芳,白林峰. 基于节点关联性的关键蛋白质识别算法研究[J]. 河南科技学院学报(自然科学版), 2016, 44(2): 68-72, 78. |
XU R, LI L F, BAI L F. Identification of essential proteins based on network node correlation[J]. Journal of Henan Institute of Science and Technology (Natural Science Edition), 2016, 44(2): 68-72, 78. | |
7 | AGRAWAL R, BORGIDA A, JAGADISH H V. Efficient management of transitive relationships in large data and knowledge bases[J]. ACM SIGMOD Record, 1989, 18(2): 253-262. 10.1145/66926.66950 |
8 | CHEN Y. On the evaluation of large and sparse graph reachability queries[C]// Proceedings of the 2008 International Conference on Database and Expert Systems Applications, LNCS5181. Berlin: Springer, 2008: 97-105. 10.1007/978-3-540-85654-2_12 |
9 | FAN W, LI J, MA S, et al. Adding regular expressions to graph reachability and pattern queries[C]// Proceedings of the IEEE 27th International Conference on Data Engineering. Piscataway: IEEE, 2011: 39-50. 10.1109/icde.2011.5767858 |
10 | YUNG D, CHANG S K. Fast reachability query computation on big attributed graphs[C]// Proceedings of the 2016 IEEE International Conference on Big Data. Piscataway: IEEE, 2016: 3370-3380. 10.1109/bigdata.2016.7840997 |
11 | ZHANG K, LI K. The optimization reachability query of large scale multi-attribute constraints directed graph[J]. Computer Systems Science and Engineering, 2018, 33(2): 71-85. 10.32604/csse.2018.33.071 |
12 | LABRINIDIS A, JAGADISH H V. Challenges and opportunities with big data[J]. Proceedings of the VLDB Endowment, 2012, 5(12): 2032-2033. 10.14778/2367502.2367572 |
13 | JAGADISH H V, GEHRKE J, LABRINIDIS A, et al. Big data and its technical challenges[J]. Communications of the ACM, 2014, 57(7): 86-94. 10.1145/2611567 |
14 | 微信.2018微信数据报告[EB/OL].[2019-06-01].[EB/OL]. [2019-06-01]. . |
15 | 张宇,刘燕兵,熊刚,等. 图数据表示与压缩技术综述[J]. 软件学报, 2014, 25(9): 1937-1952. |
ZHANG Y, LIU Y B, XIONG G, et al. Survey on succinct representation of graph data[J]. Journal of Software, 2014, 25(9): 1937-1952. | |
16 | 马绪军. 基于MapReduce的可达性保持图研究[D]. 沈阳: 沈阳航空航天大学, 2016: 6-44. 10.35812/cellulosechemtechnol.2020.54.28 |
MA X J. Research on reachability preserving graph based on MapReduce[D]. Shenyang: Shenyang Aerospace University, 2016: 6-44. 10.35812/cellulosechemtechnol.2020.54.28 | |
17 | S̆IMEC̆EK I, LANGR D, TVRDÍK P. Space efficient formats for structure of sparse matrices based on tree structures[C]// Proceedings of the 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Piscataway: IEEE, 2013: 344-351. 10.1109/synasc.2013.52 |
18 | S̆IMEC̆EK I, LANGR D. Tree-based space efficient formats for storing the structure of sparse matrices[J]. Scalable Computing: Practice and Experience, 2014, 15(1): 1-20. 10.12694/scpe.v15i1.962 |
19 | BRISABOA N R, LADRA S, NAVARRO G. K2-trees for compact Web graph representation[C]// Proceedings of the 16th International Symposium on String Processing and Information Retrieval, LNCS5721. Berlin: Springer, 2009: 18-30. |
20 | CLAUDE F, NAVARRO G. Fast and compact Web graph representations[J]. ACM Transactions on the Web, 2010, 4(4): 1-31. 10.1145/1841909.1841913 |
21 | YUAN C, CHAI Y, WEI S. Feature analysis and modeling of the network community structure[J]. Communications in Theoretical Physics, 2012, 58(4): 604-612. 10.1088/0253-6102/58/4/27 |
22 | ASANO Y, MIYAWAKI Y, NISHIZEKI T. Efficient compression of Web graphs[C]// Proceedings of the International Computing and Combinatorics Conference, LNCS5092. Berlin: Springer, 2008: 1-11. 10.1007/978-3-540-69733-6_1 |
23 | BOLDI P, VIGNA S. The WebGraph framework I: compression techniques[C]// Proceedings of the 13th International Conference on World Wide Web. New York: ACM, 2004: 595-602. 10.1145/988672.988752 |
24 | BOLDI P, VIGNA S. The WebGraph framework II: Codes for the World-Wide Web[C]// Proceedings of the 2004 Conference on Data Compression. Piscataway: IEEE, 2004: 528. 10.1109/dcc.2004.1281504 |
25 | DWYER T, RICHE N H, MARRIOTT K, et al. Edge compression techniques for visualization of dense directed graphs[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 2596-2605. 10.1109/tvcg.2013.151 |
26 | DINKLA K, WESTENBERG M A, WIJK J J VAN. Compressed adjacency matrices: untangling gene regulatory networks[J]. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(12): 2457-2466. 10.1109/tvcg.2012.208 |
27 | HERNÁNDEZ C, NAVARRO G. Compressed representation of Web and social networks via dense subgraphs[C]// Proceedings of the 2012 International Symposium on String Processing and Information Retrieval, LNCS7608. Berlin: Springer, 2012: 264-276. |
28 | CHENG J, YU J X, LIN X, et al. Fast computing reachability labelings for large graphs with high compression rate[C]// Proceedings of the 11th International Conference on Extending Database Technology. New York: ACM, 2008: 193-204. 10.1145/1353343.1353370 |
29 | AHO A V, GAREY M R, ULLMAN J D. The transitive reduction of a directed graph[J]. SIAM Journal on Computing, 1972, 1(2): 131-137. 10.1137/0201008 |
30 | FAN W, LI J, WANG X, et al. Query preserving graph compression[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012: 157-168. 10.1145/2213836.2213855 |
31 | BUNDY A, WALLEN L. Breadth-first search[M]// Catalogue of Artificial Intelligence Tools. Berlin: Springer, 1984: 13-13. 10.1007/978-3-642-96868-6_25 |
32 | VASSEUR J P, AGARWAL N, THUBERT P, et al. Dynamic directed acyclic graph (DAG) adjustment: US8489765[P]. 2013-07-16. |
33 | DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113. 10.1145/1327452.1327492 |
34 | 谢羿. 基于BFS结果集的可达性保持图并行计算[J]. 中国新技术新产品, 2016(11):35-36. 10.3969/j.issn.1673-9957.2016.11.023 |
XIE Y. Parallel computing of reachability preserving graph based on BFS result set[J]. Chinese New Technologies and New Products, 2016(11):35-36. 10.3969/j.issn.1673-9957.2016.11.023 | |
35 | HAGERUP T, MAAS M. Generalized topological sorting in linear time[J]. Nordic Journal of Computing, 1994, 1(1): 38-49. 10.1145/195058.195202 |
36 | CHENG X, DALE C, LIU J. Dataset for “Statistics and social network of YouTube videos”[EB/OL]. [2019-06-01]. . 10.1109/iwqos.2008.32 |
37 | LESKOVEC J. Stanford large network dataset collection[EB/OL]. [2019-06-01]. . |
[1] | YU Dunhui, WAN Peng, WANG She. Entity association query system based on enterprise knowledge graph construction [J]. Journal of Computer Applications, 2021, 41(9): 2510-2516. |
[2] | LI Chong, WANG Yuchen, DU Weijing, HE Xiaotao, LIU Xuemin, ZHANG Shibo, LI Shuren. PageRank-based talent mining algorithm based on Web of Science [J]. Journal of Computer Applications, 2021, 41(5): 1356-1360. |
[3] | XU Zhoubo, YANG Jian, LIU Huadong, HUANG Wenwen. Protein complex identification algorithm based on XGboost and topological structural information [J]. Journal of Computer Applications, 2020, 40(5): 1510-1514. |
[4] | Ming DU, Anping YANG, Junfeng ZHOU, Ziyang CHEN, Yun YANG. Optimized algorithm for k-step reachability queries on directed acyclic graphs [J]. Journal of Computer Applications, 2020, 40(2): 426-433. |
[5] | WANG Zhuo, SUO Bo, PAN Wei. Parallel algorithm for triangle enumeration [J]. Journal of Computer Applications, 2017, 37(12): 3397-3400. |
[6] | WEN Juping, HU Xiaosheng, LIN Dongmei, ZENG Yaguang. Graph reachability query based on reference node embedding [J]. Journal of Computer Applications, 2016, 36(7): 1998-2005. |
[7] | LIU Chao, TANG Zhengwang, YAO Hong, HU Chengyu, LIANG Qingzhong. Graph data processing technology in cloud platform [J]. Journal of Computer Applications, 2015, 35(1): 43-47. |
[8] | JIANG Yang, PENG Zhiyong, PENG Yuwei. Online pedigree editing system based on graph database [J]. Journal of Computer Applications, 2015, 35(1): 125-130. |
[9] | DING Yue ZHANG Yang LI Zhan-huai WANG Yong. Research and advances on graph data mining [J]. Journal of Computer Applications, 2012, 32(01): 182-190. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||