Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (11): 3184-3191.DOI: 10.11772/j.issn.1001-9081.2020040539
• Data science and technology • Previous Articles Next Articles
YANG Cheng, LU Jiamin, FENG Jun
Received:
2020-04-26
Revised:
2020-06-21
Online:
2020-07-20
Published:
2020-11-10
Supported by:
杨程, 陆佳民, 冯钧
通讯作者:
冯钧(1969-),女,江苏武进人,教授,博士,CCF会员,主要研究方向:时空数据管理、智能数据处理、数据挖掘、水利信息化;fengjun@hhu.edu.cn
作者简介:
杨程(1996-),女,安徽芜湖人,硕士研究生,CCF会员,主要研究方向:知识图谱数据管理、分布式数据库;陆佳民(1983-),男,江苏南通人,讲师,博士,CCF会员,主要研究方向:移动对象数据管理、分布式数据处理、水利信息化
基金资助:
CLC Number:
YANG Cheng, LU Jiamin, FENG Jun. Survey of large-scale resource description framework data partitioning methods in distributed environment[J]. Journal of Computer Applications, 2020, 40(11): 3184-3191.
杨程, 陆佳民, 冯钧. 分布式环境下大规模资源描述框架数据划分方法综述[J]. 计算机应用, 2020, 40(11): 3184-3191.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2020040539
[1] 漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017,3(1):4-25.(QI G L,GAO H,WU T X. The research advances of knowledge graph[J]. Technology Intelligence Engineering,2017,3(1):4-25.) [2] 邹磊, 彭鹏. 分布式RDF数据管理综述[J]. 计算机研究与发展,2017,54(6):1213-1224.(ZOU L,PENG P. A survey of distributed RDF data management[J]. Journal of Computer Research and Development,2017,54(6):1213-1224.) [3] 顾荣, 仇红剑, 杨文家, 等. Goldfish:基于矩阵分解的大规模RDF数据存储与查询系统[J]. 计算机学报,2017,40(10):2212-2230.(GU R,QIU H J,YANG W J,et al. Goldfish:a large scale semantic data store and query system based on Boolean matrix factorization[J]. Chinese Journal of Computers,2017,40(10):2212-2230.) [4] 王鑫, 邹磊, 王朝坤, 等. 知识图谱数据管理研究综述[J]. 软件学报,2019,30(7):2139-2174.(WANG X,ZOU L,WANG C K, et al. Research on knowledge graph data management:a survey[J]. Journal of Software,2019,30(7):2139-2174.) [5] 于戈, 谷峪, 鲍玉斌, 等. 云计算环境下的大规模图数据处理技术[J]. 计算机学报,2011,34(10):1753-1767.(YU G,GU Y, BAO Y B,et al. Large scale graph data processing on cloudcomputing environments[J]. Chinese Journal of Computers,2011, 34(10):1753-1767.) [6] 王童童, 荣垂田, 卢卫, 等. 分布式图处理系统技术综述[J]. 软件学报,2018,29(3):569-586.(WANG T T,RONG C T,LU W, et al. Survey on technologies of distributed graph processing systems[J]. Journal of Software,2018,29(3):569-586.) [7] 王鑫, 陈蔚雪, 杨雅君, 等. 知识图谱划分算法研究综述[J/OL]. 计算机学报,2020,43(10)[2020-04-20]. http://cjc.ict.ac.cn/online/cre/wxnew-2020410114942.pdf. (WANG X,CHEN W X, YANG Y J,et al. Research on knowledge graph partitioning algorithms:a survey[J]. Chinese Journal of Computers,2020,43(10)[2020-04-20]. http://cjc.ict.ac.cn/online/cre/wxnew-2020410114942.pdf.) [8] ALUÇ G,HARTIG O,TAMER ÖZSU M,et al. Diversified stress testing of RDF data management systems[C]//Proceedings of the 13th International Semantic Web Conference,LNCS 8796. Cham:Springer,2014:197-212. [9] ABADI D J,MARCUS A,MADDEN S R,et al. Scalable semantic web data management using vertical partitioning[C]//Proceedings of the 33rd International Conference on Very Large Data Bases. New York:VLDB Endowment,2007:411-422. [10] 彭鹏, 邹磊. 联邦型RDF数据管理系统综述[J]. 数据与计算发展前沿,2019,1(1):73-81.(PENG P,ZOU L. Survey on federated RDF systems[J]. Frontiers of Data and Computing, 2019,1(1):73-81.) [11] SCHÄTZLE A,PRZYJACIEL-ZABLOCKI M,SKILEVIC S,et al. S2RDF:RDF querying with SPARQL on Spark[J]. Proceedings of the VLDB Endowment,2016,9(10):804-815. [12] LENG Y,CHEN Z,WANG H,et al. A partitioning and index algorithm for RDF data of cloud-based robotic systems[J]. IEEE Access,2018,6:29836-29845. [13] PENG P,ZOU L,GUAN R. Accelerating partial evaluation in distributed SPARQL query evaluation[C]//Proceedings of the IEEE 35th International Conference on Data Engineering. Piscataway:IEEE,2019:112-123. [14] 赵翔, 李博, 商海川, 等. 一种改进的基于BSP的大图计算模型[J]. 计算机学报,2017,40(1):223-235.(ZHAO X,LI B, SHANG H C, et al. A revised BSP-based massive graphcomputation model[J]. Chinese Journal of Computers,2017,40(1):223-235.) [15] CHEN R,SHI J,CHEN Y,et al. PowerLyra:differentiated graphcomputation and partitioning on skewed graphs[J]. ACM Transactions on Parallel Computing,2019,5(3):No. 13. [16] YUAN P, LIN L, KOU Z, et al. Big RDF data storage,computation, and analysis:a strawman's arguments[C]//Proceedings of 39th International Conference on Distributed Computing Systems. Piscataway:IEEE,2019:1693-1703. [17] GAI L,LIU J,WANG X,et al. Querying RDF graph over partitioned indexes[C]//Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. Piscataway:IEEE,2017:2262-2267. [18] WANG L,XIAO Y,SHAO B,et al. How to partition a billionnode graph[C]//Proceedings of the IEEE 30th International Conference on Data Engineering. Piscataway:IEEE, 2014:568-579. [19] MARGO D,SELTZER M. A scalable distributed graph partitioner[J]. Proceedings of the VLDB Endowment,2015,8(12):1478-1489. [20] ZHANG X,CHEN L,TONG Y,et al. EAGRE:towards scalable I/O efficient SPARQL query evaluation on the cloud[C]//Proceedings of the IEEE 29th International Conference on Data Engineering. Piscataway:IEEE,2013:565-576. [21] KARYPIS G, KUMAR V. METIS:unstructured graph partitioning and sparse matrix ordering system,version 2.0[EB/OL].[2019-11-12]. https://wwwresearchgate.net/publication/246815679_METIS_--_Unstructured_Graph_Partitioning_and_Sparse_Matrix_Ordering_System_Version_20. [22] WYLOT M,MAUROUX P C. DiploCloud:efficient and scalable management of RDF data in the cloud[J]. IEEE Transactions of Knowledge and Data Engineering,2016,28(3):659-674. [23] 袁柳, 张龙波. 一种基于聚类模式的RDF数据聚类方法[J]. 计算机科学,2015,42(10):266-270,296.(YUAN L,ZHANG L B. Cluster pattern based RDF data clustering method[J]. Computer Science,2015,42(10):266-270,296.) [24] WU B,ZHOU Y,YUAN P,et al. Scalable SPARQL querying using path partitioning[C]//Proceedings of the IEEE 31st International Conference on Data Engineering. Piscataway:IEEE, 2015:795-806. [25] LENG Y,CHEN Z,ZHONG F,et al. BRDPHHC:a balance RDF data partitioning algorithm based on hybrid hierarchical clustering[C]//Proceedings of the IEEE 17th International Conference on High Performance Computing and Communications/IEEE 7th International Symposium on Cyberspace Safety and Security/IEEE 12th International Conference on Embedded Software and Systems. Piscataway:IEEE,2015:1755-1760. [26] WU B, ZHOU Y, YUAN P, et al. SemStore:a semanticpreserving distributed RDF triple store[C]//Proceedings of the 23rd International Conference on Information and Knowledge Management. New York:ACM,2014:509-518. [27] ABDELAZIZ I,HARBI R,SALIHOGLU S,et al. Combining vertex-centric graph processing with SPARQL for large-scale RDF data analytics[J]. IEEE Transactions on Parallel and Distributed Systems,2017,28(12):3374-3388. [28] GU R, HU W, HUANG Y. Rainbow:a distributed and hierarchical RDF triple store with dynamic scalability[C]//Proceedings of the 2014 IEEE International Conference on Big Data. Piscataway:IEEE,2014:561-566. [29] HARBI R, ABDELAZIZ I, KALNIS P, et al. Accelerating SPARQL queries by exploiting Hash-based locality and adaptive partitioning[J]. The VLDB Journal,2016,25(3):355-380. [30] CURÉ O,NAACKE H,BAAZIZI M A,et al. HAQWA:a Hashbased and query workload aware distributed RDF store[C]//Proceedings of the ISWC 2015 Posters and Demonstrations Track/the 14th International Semantic Web Conference,1486. Aachen:CEUR-WS.org,2015:No. hal-01214900. [31] GURAJADA S,SEUFERT S,MILIARAKI I,et al. TriAD:a distributed shared-nothing RDF engine based on asynchronous message passing[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York:ACM,2014:289-300. [32] LEE K, LIU L. Scaling queries over big RDF graphs with semantic hash partitioning[J]. Proceedings of the VLDB Endowment,2013,6(14):1894-1905. [33] AKHTER A, NGOMO NGONGA, A C, SALEEM M. An empirical evaluation of RDF graph partitioning techniques[C]//Proceedings of the 21st European Knowledge Acquisition Workshop,LNCS 11313. Cham:Springer,2018:3-18. [34] DU J,WANG H,NI Y,et al. HadoopRDF:a scalable semantic data analytical engine[C]//Proceedings of the International Conference on Intelligent Computing, LNCS 7390. Berlin:Springer,2012:633-641. [35] SCHÄTZLE A,PRZYJACIEL-ZABLOCKI M,HOMUNG T,et al. PigSPARQL:a SPARQL query processing baseline for big data[C]//Proceedings of the 12th International Semantic Web Conference(Posters and Demonstrations Track),1035. Aachen:CEUR-WS.org,2013:241-244. [36] ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al. Spark:clustercomputing with working sets[C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley,CA:USENIX Association,2010:1-7. [37] GRAUX D,JACHIET L,GENEVÈS P,et al. SPARQLGX:efficient distributed evaluation of SPARQL with Apache Spark[C]//Proceedings of the 15th International Semantic Web Conference,LNCS 9982. Cham:Springer,2016:80-87. [38] LI S,SHEN D,KOU Y,et al. Query optimization for massive RDF data based on Spark[C]//Proceedings of the 4th International Conference on Big Data Computing and Communications. Piscataway:IEEE,2018:219-224. [39] HASSAN M,BANSAL S K. Semantic data querying over NoSQL databases with Apache Spark[C]//Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration. Piscataway:IEEE,2018:364-371. [40] HASSAN M,BANSAL S K. RDF data storage techniques for efficient SPARQL query processing using distributedcomputation engines[C]//Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration. Piscataway:IEEE,2018:323-330. [41] CHEN X, CHEN H, ZHANG N, et al. SparkRDF:elastic discreted RDF graph processing engine with distributed memory[C]//Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. Piscataway:IEEE,2015:292-300. [42] GOASDOUÉ F, KAOUDI Z, MANOLESCU I, et al. CliqueSquare:flat plans for massively parallel RDF queries[C]//Proceedings of the IEEE 31st International Conference on Data Engineering. Piscataway:IEEE,2015:771-782. [43] PAPAILIOU N,KONSTANTINOU I,TSOUMAKOS D,et al. H2RDF+:high-performance distributed joins over large-scale RDF graphs[C]//Proceedings of the 2013 IEEE International Conference on Big Data. Piscataway:IEEE,2013:255-263. [44] ROHLOFF K, SCHANTZ R E. High-performance, massively scalable distributed systems using the MapReduce software framework:the SHARD triple-store[C]//Proceedings of the SPLASH Workshop on Programming Support Innovations for Emerging Distributed Applications. New York:ACM, 2010:No. 4. [45] NAACKE H,AMANN B,CURÉ O. SPARQL graph pattern processing with Apache Spark[C]//Proceedings of the 5th International Workshop on Graph Data-Management Experiences and Systems. New York:ACM,2017:No. 1. [46] COSSU M, FÄRBER M, LAUSEN G. PRoST:distributed execution of SPARQL queries using mixed partitioning strategies[C]//Proceedings of the 21th International Conference on Extending Database Technology.[S. l.] OpenProceedings.org, 2018:469-472. [47] HASSAN M,BANSAL S K. Data partitioning scheme for efficient distributed RDF querying using Apache Spark[C]//Proceedings of the IEEE 13th International Conference on Semantic Computing. Piscataway:IEEE,2019:24-31. [48] PENG P,ZOU L,CHEN L,et al. Adaptive distributed RDF graph fragmentation and allocation based on query workload[J]. IEEE Transactions of Knowledge and Data Engineering,2019,31(4):670-685. [49] GALÁRRAGA L, HOSE K, SCHENKEL R. Partout:a distributed engine for efficient RDF processing[C]//Proceedings of the 23rd International Conference on World Wide Web. New York:ACM,2014:267-268. [50] HOSE K,SCHENKEL R. WARP:workload-aware replication and partitioning for RDF[C]//Proceedings of the IEEE 29th International Conference on Data Engineering Workshops. Piscataway:IEEE,2013:1-6. |
[1] | Jun FENG, Bingfa WANG, Jiamin LU. Query performance evaluation of distributed resource description framework data management systems [J]. Journal of Computer Applications, 2022, 42(2): 440-448. |
[2] | Hai LAN, Ke HAN, Li SHEN, Qiu CUI, Yuwei PENG. Accessing optimization with multiple indexes in TiDB [J]. Journal of Computer Applications, 2020, 40(2): 410-415. |
[3] | FAN Jili, HE Pu, LI Xiaohua, NIE Tiezheng, YU Ge. Blockchain based decentralized item sharing and transaction service system [J]. Journal of Computer Applications, 2019, 39(5): 1330-1335. |
[4] | GUAN Haoyuan, ZHU Bin, LI Guanyu, CAI Yongjia. Efficient subgraph matching method based on resource description framework graph segmentation and vertex selectivity [J]. Journal of Computer Applications, 2019, 39(2): 360-369. |
[5] | GUAN Haoyuan, ZHU Bin, LI Guanyu, ZHAO Ling. Efficient subgraph matching method based on structure segmentation of RDF graph [J]. Journal of Computer Applications, 2018, 38(7): 1898-1904. |
[6] | ZOU Chengming, XIE Yi, WU Pei. Query optimization based on Greenplum database [J]. Journal of Computer Applications, 2018, 38(2): 478-482. |
[7] | LIN Jiming, BAN Wenjiao, WANG Junyi, TONG Jichao. Query optimization for distributed database based on parallel genetic algorithm and max-min ant system [J]. Journal of Computer Applications, 2016, 36(3): 675-680. |
[8] | CUAN Linna, SHI Yimin, LI Guanyu, WU Xuehua. Linked sensor data publishing system in semantic Web of things [J]. Journal of Computer Applications, 2015, 35(9): 2440-2446. |
[9] | HAN Caili, LI Jiajun, ZHANG Xiaopei, XIAO Min. Query expansion method based on semantic property feature graph [J]. Journal of Computer Applications, 2015, 35(2): 440-443. |
[10] | TANG Cheng-long XING Chang-zheng. Outlier mining algorithm based on data-partitioning and grid [J]. Journal of Computer Applications, 2012, 32(08): 2193-2197. |
[11] | LIU Yi LIN Zi-yu. Concurrency control in distributed database system with score-based method [J]. Journal of Computer Applications, 2011, 31(05): 1404-1408. |
[12] | . Research on optimal placement of data copies in distributed database [J]. Journal of Computer Applications, 2009, 29(09): 2509-2511. |
[13] | Xue-Lin SHI . Data modeling and retrieval in semantic grid based on RDF [J]. Journal of Computer Applications, 2008, 28(9): 2324-2327. |
[14] | TANG Di-Bin Jin-Lin WANG Hong NI. Dynamic data storage scheme in CDN - UbDP [J]. Journal of Computer Applications, 2008, 28(8): 1991-1993. |
[15] | . Using logic to optimise the semantic Web query language processing [J]. Journal of Computer Applications, 2006, 26(12): 2800-2802. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||