Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 426-433.DOI: 10.11772/j.issn.1001-9081.2019081605
• CCF NDBC 2019 • Previous Articles Next Articles
Ming DU1, Anping YANG1, Junfeng ZHOU1(), Ziyang CHEN2, Yun YANG1
Received:
2019-08-16
Revised:
2019-09-23
Accepted:
2019-10-24
Online:
2019-10-31
Published:
2020-02-10
Contact:
Junfeng ZHOU
About author:
DU Ming, born in 1975, Ph. D., associate professor. His research interests include information retrieval, data analysis and mining, natural language processing.通讯作者:
周军锋
作者简介:
杜明(1975—),男,黑龙江虎林人,副教授,博士,主要研究方向:信息检索、数据分析和挖掘、自然语言处理CLC Number:
Ming DU, Anping YANG, Junfeng ZHOU, Ziyang CHEN, Yun YANG. Optimized algorithm for k-step reachability queries on directed acyclic graphs[J]. Journal of Computer Applications, 2020, 40(2): 426-433.
杜明, 杨安平, 周军锋, 陈子阳, 杨云. 有向无环图上k步可达查询优化算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 426-433.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019081605
顶点 | LIN | LOUT |
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
Tab. 1 Shortest path index based on all points
顶点 | LIN | LOUT |
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
顶点 | ||
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
Tab. 2 Shortest path index based on c and f
顶点 | ||
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
顶点 | ||
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
Tab. 3 Optimized shortest path index based on c and f
顶点 | ||
---|---|---|
a | ||
b | ||
c | ||
d | ||
e | ||
f | ||
g |
数据集 | |||
---|---|---|---|
human | 38 811 | 39 576 | 1.01 |
citeseer | 10 720 | 44 258 | 4.13 |
pubmed | 9 000 | 40 028 | 4.44 |
yago | 6 642 | 42 392 | 6.38 |
arxiv | 6 000 | 66 707 | 11.12 |
Email-EuAll | 231 000 | 223 004 | 0.97 |
uniprot100m | 16 087 295 | 16 087 293 | 1.00 |
uniprot150m | 25 037 600 | 25 037 598 | 1.00 |
WikiTalk | 2 281 879 | 2 311 570 | 1.01 |
soc-LJ | 971 232 | 1 024 140 | 1.05 |
web-Google | 371 764 | 517 805 | 1.39 |
10cit-Patent | 1 097 775 | 1 651 894 | 1.51 |
10citeseerx | 770 539 | 1 501 126 | 1.95 |
citeseerx | 6 540 401 | 15 011 260 | 2.30 |
dbpedia | 3 365 623 | 7 989 191 | 2.37 |
govwild | 8 022 880 | 23 652 610 | 2.95 |
cit-Patents | 3 774 768 | 16 518 947 | 4.38 |
go_uniprot | 6 967 956 | 34 769 339 | 4.99 |
10go-unipr | 469 526 | 3 476 397 | 7.40 |
18 121 168 | 18 359 487 | 1.01 | |
web-uk | 22 753 644 | 38 184 039 | 1.68 |
Tab. 4 Statistics of experimental datasets
数据集 | |||
---|---|---|---|
human | 38 811 | 39 576 | 1.01 |
citeseer | 10 720 | 44 258 | 4.13 |
pubmed | 9 000 | 40 028 | 4.44 |
yago | 6 642 | 42 392 | 6.38 |
arxiv | 6 000 | 66 707 | 11.12 |
Email-EuAll | 231 000 | 223 004 | 0.97 |
uniprot100m | 16 087 295 | 16 087 293 | 1.00 |
uniprot150m | 25 037 600 | 25 037 598 | 1.00 |
WikiTalk | 2 281 879 | 2 311 570 | 1.01 |
soc-LJ | 971 232 | 1 024 140 | 1.05 |
web-Google | 371 764 | 517 805 | 1.39 |
10cit-Patent | 1 097 775 | 1 651 894 | 1.51 |
10citeseerx | 770 539 | 1 501 126 | 1.95 |
citeseerx | 6 540 401 | 15 011 260 | 2.30 |
dbpedia | 3 365 623 | 7 989 191 | 2.37 |
govwild | 8 022 880 | 23 652 610 | 2.95 |
cit-Patents | 3 774 768 | 16 518 947 | 4.38 |
go_uniprot | 6 967 956 | 34 769 339 | 4.99 |
10go-unipr | 469 526 | 3 476 397 | 7.40 |
18 121 168 | 18 359 487 | 1.01 | |
web-uk | 22 753 644 | 38 184 039 | 1.68 |
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 0.91 | 1.18 | 0.39 |
citeseer | 0.77 | 0.33 | 0.12 |
pubmed | 0.70 | 0.27 | 0.11 |
yago | 0.51 | 0.20 | 0.08 |
arxiv | 2.68 | 0.18 | 0.07 |
Email-EuAll | 5.22 | 7.05 | 3.25 |
uniprot100m | 393.76 | 490.95 | 240.47 |
uniprot150m | 635.14 | 764.09 | 342.04 |
WikiTalk | 52.54 | 69.67 | 30.82 |
soc-LJ | 22.73 | 29.64 | 11.82 |
web-Google | 10.12 | 11.35 | 4.67 |
10cit-Patent | 30.22 | 33.50 | 13.75 |
10citeseerx | 39.28 | 23.52 | 8.76 |
citeseerx | 276.14 | 199.60 | 90.80 |
dbpedia | 121.90 | 102.71 | 46.36 |
govwild | 398.22 | 244.84 | 116.42 |
cit-Patents | 1 748.75 | 115.20 | 53.60 |
go_uniprot | 532.72 | 212.64 | 197.32 |
10go-unipr | 46.63 | 14.33 | 6.17 |
416.68 | 553.01 | 236.51 | |
web-uk | 967.76 | 694.39 | 307.19 |
Tab. 5 Index size on different datasets
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 0.91 | 1.18 | 0.39 |
citeseer | 0.77 | 0.33 | 0.12 |
pubmed | 0.70 | 0.27 | 0.11 |
yago | 0.51 | 0.20 | 0.08 |
arxiv | 2.68 | 0.18 | 0.07 |
Email-EuAll | 5.22 | 7.05 | 3.25 |
uniprot100m | 393.76 | 490.95 | 240.47 |
uniprot150m | 635.14 | 764.09 | 342.04 |
WikiTalk | 52.54 | 69.67 | 30.82 |
soc-LJ | 22.73 | 29.64 | 11.82 |
web-Google | 10.12 | 11.35 | 4.67 |
10cit-Patent | 30.22 | 33.50 | 13.75 |
10citeseerx | 39.28 | 23.52 | 8.76 |
citeseerx | 276.14 | 199.60 | 90.80 |
dbpedia | 121.90 | 102.71 | 46.36 |
govwild | 398.22 | 244.84 | 116.42 |
cit-Patents | 1 748.75 | 115.20 | 53.60 |
go_uniprot | 532.72 | 212.64 | 197.32 |
10go-unipr | 46.63 | 14.33 | 6.17 |
416.68 | 553.01 | 236.51 | |
web-uk | 967.76 | 694.39 | 307.19 |
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 10.60 | 5.89 | 5.24 |
citeseer | 22.24 | 3.09 | 2.96 |
pubmed | 24.35 | 2.99 | 3.20 |
yago | 7.45 | 2.61 | 2.55 |
arxiv | 232.31 | 2.58 | 11.45 |
Email-EuAll | 78.03 | 38.41 | 41.55 |
unprot100m | 6 642.67 | 4 299.18 | 3 817.71 |
unprot150m | 12 033.31 | 6 776.78 | 6 725.61 |
WikiTalk | 617.26 | 369.50 | 318.81 |
soc-LJ | 360.48 | 172.819 | 185.40 |
web-Google | 223.31 | 99.04 | 101.70 |
10cit-Patent | 964.10 | 614.03 | 420.48 |
10citeseerx | 1 690.31 | 314.29 | 254.37 |
citeseerx | 14 563.31 | 5 012.91 | 4 166.43 |
dbpedia | 3 989.27 | 1 777.21 | 1 835.07 |
govwild | 12 069.51 | 3 347.31 | 3 769.61 |
cit-Patents | 476 502 | 8 040.33 | 6 065.75 |
go_uniprot | 15 748.51 | 4 079.98 | 7 624.27 |
10go-unipr | 1 453.04 | 328.76 | 584.98 |
6 727.98 | 3 712.88 | 3 664.62 | |
web-uk | 47 093.21 | 3 703.31 | 4 686.35 |
Tab. 6 Index time on different datasets
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 10.60 | 5.89 | 5.24 |
citeseer | 22.24 | 3.09 | 2.96 |
pubmed | 24.35 | 2.99 | 3.20 |
yago | 7.45 | 2.61 | 2.55 |
arxiv | 232.31 | 2.58 | 11.45 |
Email-EuAll | 78.03 | 38.41 | 41.55 |
unprot100m | 6 642.67 | 4 299.18 | 3 817.71 |
unprot150m | 12 033.31 | 6 776.78 | 6 725.61 |
WikiTalk | 617.26 | 369.50 | 318.81 |
soc-LJ | 360.48 | 172.819 | 185.40 |
web-Google | 223.31 | 99.04 | 101.70 |
10cit-Patent | 964.10 | 614.03 | 420.48 |
10citeseerx | 1 690.31 | 314.29 | 254.37 |
citeseerx | 14 563.31 | 5 012.91 | 4 166.43 |
dbpedia | 3 989.27 | 1 777.21 | 1 835.07 |
govwild | 12 069.51 | 3 347.31 | 3 769.61 |
cit-Patents | 476 502 | 8 040.33 | 6 065.75 |
go_uniprot | 15 748.51 | 4 079.98 | 7 624.27 |
10go-unipr | 1 453.04 | 328.76 | 584.98 |
6 727.98 | 3 712.88 | 3 664.62 | |
web-uk | 47 093.21 | 3 703.31 | 4 686.35 |
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 54.04 | 42.36 | 22.49 |
citeseer | 94.97 | 84.81 | 46.68 |
pubmed | 63.44 | 61.99 | 38.49 |
yago | 51.48 | 35.67 | 17.67 |
arxiv | 371.15 | 336.50 | 145.22 |
Email-EuAll | 86.57 | 57.71 | 37.03 |
unprot100m | 71.09 | 60.83 | 11.49 |
unprot150m | 65.43 | 62.15 | 11.45 |
WikiTalk | 105.08 | 77.17 | 48.57 |
soc-LJ | 153.13 | 124.11 | 76.81 |
web-Google | 141.81 | 132.59 | 69.11 |
10cit-Patent | 88.39 | 39.58 | 15.16 |
10citeseerx | 143.41 | 74.84 | 54.21 |
citeseerx | 174.43 | 126.10 | 97.17 |
dbpedia | 149.32 | 131.11 | 72.29 |
govwild | 172.26 | 127.73 | 97.14 |
cit-Patents | 539.40 | 158.39 | 96.79 |
go_uniprot | 184.28 | 54.32 | 36.62 |
10go-unipr | 154.07 | 58.70 | 54.03 |
214.22 | 155.51 | 120.57 | |
web-uk | 248.95 | 242.83 | 207.51 |
Tab. 7 Query time on different datasets when k=3
数据集 | PLL | BFSI-B | kRH |
---|---|---|---|
human | 54.04 | 42.36 | 22.49 |
citeseer | 94.97 | 84.81 | 46.68 |
pubmed | 63.44 | 61.99 | 38.49 |
yago | 51.48 | 35.67 | 17.67 |
arxiv | 371.15 | 336.50 | 145.22 |
Email-EuAll | 86.57 | 57.71 | 37.03 |
unprot100m | 71.09 | 60.83 | 11.49 |
unprot150m | 65.43 | 62.15 | 11.45 |
WikiTalk | 105.08 | 77.17 | 48.57 |
soc-LJ | 153.13 | 124.11 | 76.81 |
web-Google | 141.81 | 132.59 | 69.11 |
10cit-Patent | 88.39 | 39.58 | 15.16 |
10citeseerx | 143.41 | 74.84 | 54.21 |
citeseerx | 174.43 | 126.10 | 97.17 |
dbpedia | 149.32 | 131.11 | 72.29 |
govwild | 172.26 | 127.73 | 97.14 |
cit-Patents | 539.40 | 158.39 | 96.79 |
go_uniprot | 184.28 | 54.32 | 36.62 |
10go-unipr | 154.07 | 58.70 | 54.03 |
214.22 | 155.51 | 120.57 | |
web-uk | 248.95 | 242.83 | 207.51 |
数据集 | BFSI-B | kRH |
---|---|---|
human | 799 564 | 999 956 |
citeseer | 794 836 | 911 257 |
pubmed | 760 636 | 926 193 |
yago | 711 253 | 984 771 |
arxiv | 603 701 | 771 523 |
Email-EuAll | 698 539 | 999 939 |
uniprot100m | 808 389 | 999 995 |
uniprot150m | 785 811 | 999 994 |
WikiTalk | 882 736 | 999 853 |
soc-LJ | 753 982 | 998 973 |
web-Google | 754 241 | 986 717 |
10cit-Patent | 847 771 | 998 752 |
10citeseerx | 710 964 | 977 381 |
citeseerx | 646 989 | 977 857 |
dbpedia | 888 168 | 986 482 |
govwild | 891 929 | 980 872 |
cit-Patents | 717 408 | 998 286 |
go_uniprot | 877 038 | 965 690 |
10go-unipr | 846 093 | 932 241 |
854 191 | 999 701 | |
web-uk | 745 658 | 972 058 |
Tab. 8 Query answers can be determined in constant time when k=3
数据集 | BFSI-B | kRH |
---|---|---|
human | 799 564 | 999 956 |
citeseer | 794 836 | 911 257 |
pubmed | 760 636 | 926 193 |
yago | 711 253 | 984 771 |
arxiv | 603 701 | 771 523 |
Email-EuAll | 698 539 | 999 939 |
uniprot100m | 808 389 | 999 995 |
uniprot150m | 785 811 | 999 994 |
WikiTalk | 882 736 | 999 853 |
soc-LJ | 753 982 | 998 973 |
web-Google | 754 241 | 986 717 |
10cit-Patent | 847 771 | 998 752 |
10citeseerx | 710 964 | 977 381 |
citeseerx | 646 989 | 977 857 |
dbpedia | 888 168 | 986 482 |
govwild | 891 929 | 980 872 |
cit-Patents | 717 408 | 998 286 |
go_uniprot | 877 038 | 965 690 |
10go-unipr | 846 093 | 932 241 |
854 191 | 999 701 | |
web-uk | 745 658 | 972 058 |
1 | ZHANG T, GAO Y, LI C, et al. Distributed reachability queries on massive graphs[C]// Proceedings of the 2019 International Conference on Database Systems for Advanced Applications, LNCS11448. Cham: Springer, 2019: 406-410. |
2 | SENGUPTA N, BAGCHI A, RAMANATH M, et al. Arrow: approximating reachability using random walks over Web-scale graphs[C]// Proceedings of the IEEE 35th International Conference on Data Engineering. Piscataway: IEEE, 2019:470-481. 10.1109/icde.2019.00049 |
3 | XIE X, YANG X, WANG X, et al. BFSI-B: an improved k-hop graph reachability queries for cyber-physical systems[J]. Information Fusion, 2017, 38:35-42. 10.1016/j.inffus.2017.02.009 |
4 | YANO Y, AKIBA T, IWATA Y, et al. Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths[C]// Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. New York: ACM, 2013: 1601-1606. 10.1145/2505515.2505724 |
5 | CHENG J, SHANG Z, CHENG H, et al. Efficient processing of k-hop reachability queries[J]. The VLDB Journal, 2014, 23(2):227-252. 10.1007/s00778-013-0346-6 |
6 | CHENG J, SHANG Z, CHENG H, et al. K-reach: who is in your small world[J]. Proceedings of the VLDB Endowment, 2012, 5(11): 1292-1303. 10.14778/2350229.2350247 |
7 | CHENG J, KE Y, CHU S, et al. Efficient processing of distance queries in large graphs: a vertex cover approach[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012: 457-468. 10.1145/2213836.2213888 |
8 | AGRAWAL R, BORGIDA A, JAGADISH H V. Efficient management of transitive relationships in large data and knowledge bases[C]// Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1989:253-262. 10.1145/66926.66950 |
9 | AKIBA T, IWATA Y, YUICHI YOSHIDA Y. Fast exact shortest-path distance queries on large networks by pruned landmark labeling[C]// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 349-360. 10.1145/2463676.2465315 |
10 | TRIßL S, LESER U. Fast and practical indexing and querying of very large graphs[C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2007: 845-856. 10.1145/1247480.1247573 |
11 | YILDIRIM H, CHAOJI V, ZAKI M J. Grail: scalable reachability index for large graphs[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 276-284. 10.14778/1920841.1920879 |
12 | VELOSO R R, CERF L, MEIRA W, Jr, et al. Reachability queries in very large graphs: a fast refined online search approach[C]// Proceedings of the 17th International Conference on Extending DB Technology. Berling: Springer, 2014: 511-522. |
13 | ZHOU J, ZHOU S, YU J X, et al. DAG reduction: fast answering reachability queries[C]// Proceedings of the 2017 ACM International Conference on Management of Data. New York: ACM, 2017: 375-390. 10.1145/3035918.3035927 |
14 | BOLDI P, SANTINI M, VIGNA S. A large time-aware Web graph[J]. ACM SIGIR Forum, 2008, 42(2): 33-38. 10.1145/1480506.1480511 |
15 | JIN R, RUAN N, DEY S, et al. Scarab: scaling reachability computation on large graphs[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012: 169-180. 10.1145/2213836.2213856 |
16 | van SCHAIK S, de MOOR O. A memory efficient reachability data structure through bit vector compression[C]// Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2011: 913-924. 10.1145/1989323.1989419 |
[1] | JIANG Kun, LIU Zheng, ZHU Lei, LI Xiaoxing. Fixed word-aligned partition compression algorithm of inverted list based on directed acyclic graph [J]. Journal of Computer Applications, 2021, 41(3): 727-732. |
[2] | CHENG Wenliang, WANG Zhihong, ZHOU Yu, GUO Yi, ZHAO Junfeng. Design of distributed computing framework for foreign exchange market monitoring [J]. Journal of Computer Applications, 2020, 40(1): 173-180. |
[3] | LIAO Bin, ZHANG Tao, GUO Binglei, YU Jiong, ZHANG Xuguang, LIU Yan. Performance optimization of ItemBased recommendation algorithm based on Spark [J]. Journal of Computer Applications, 2017, 37(7): 1900-1905. |
[4] | DING Mengsu, CHEN Shimin. Helius: a lightweight big data processing system [J]. Journal of Computer Applications, 2017, 37(2): 305-310. |
[5] | WANG Guan, WANG Yuxin, CHEN Xin, WANG Fei, GUO He. Heterogeneous scheduling algorithm with immediate successor finish time [J]. Journal of Computer Applications, 2017, 37(1): 12-17. |
[6] | WEN Juping, HU Xiaosheng, LIN Dongmei, ZENG Yaguang. Graph reachability query based on reference node embedding [J]. Journal of Computer Applications, 2016, 36(7): 1998-2005. |
[7] | XU Dongdong, YUAN Lingyun. Method of IOT complex event processing based on event sharing mechanism [J]. Journal of Computer Applications, 2015, 35(2): 326-331. |
[8] | SHEN Dhu ZHU Zhiyu WU Jiang. Reconfigurable hybrid task scheduling algorithm [J]. Journal of Computer Applications, 2014, 34(2): 387-390. |
[9] | LIU Danqi YU Jiong Ying Changtian. Energy efficient scheduling for multiple directed acyclic graph in cloud computing [J]. Journal of Computer Applications, 2013, 33(09): 2410-2415. |
[10] | . Mapping transform of query plan model in grid database based on Petri net [J]. Journal of Computer Applications, 2007, 27(6): 1378-1381. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||