[1] LI G, DENG D, WANG J, et al. Pass-Join: a partition-based method for similarity joins [J]. Proceedings of the VLDB endowment, 2011, 5(3): 253-264. [2] JIANG Y, DEND D, WANG J, et al. Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints [C]// Proceedings of the Joint EDBT/ICDT 2013 Workshops. New York: ACM, 2013: 341-348. [3] 荣垂田,徐天任,杜小勇.基于划分的集合相似连接[J].计算机研究与发展,2012,49(10):2066-2076.(RONG C T, XU T R, DU X Y. Partition-based set similarity join [J]. Journal of computer research and development, 2012, 49(10): 2066-2076.) [4] 曹海,骆吉洲,陈懿诚.一种基于数据划分的字符串相似性连接外存算法[J].智能计算机与应用,2012,2(5):31-34.(CAO H, LUO J Z, CHEN Y C. A data-partition based disk algorithm for string join [J]. Intelligent computer and applications, 2012, 2(5): 31-34.) [5] LU J, LIN C, WANG W, et al. String similarity measures and joins with synonyms [C]// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 373-384. [6] ARASU A, GANTI V, KAUSHIK R. Efficient exact set-similarity joins [C]// Proceedings of the 32nd International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2006: 918-929. [7] XIAO C, WANG W, LIN X. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints [J]. Proceedings of the VLDB endowment, 2008, 1(1): 933-944. [8] WANG J, FENG J, LI G. Trie-Join: efficient trie-based string similarity joins with edit-distance constraints [J]. Proceedings of the VLDB endowment, 2010, 3(1/2): 1219-1230. [9] METWALLY A, FALOUTSOS C. V-SMART-Join: a scalable MapReduce framework for all-pair similarity joins of multisets and vectors [J]. Proceedings of the VLDB endowment, 2012, 5(8): 704-715. [10] DONG X, SRIVASTAVA D. Big data integration [C]// ICDE 2013: Proceedings of the 2013 IEEE 29th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2013: 1245-1248. [11] CHRISTEN P. A survey of indexing techniques for scalable record linkage and deduplication [J]. IEEE transactions on knowledge and data engineering, 2012, 24(9): 1537-1555. [12] CHEN Q, HSU M. Continuous MapReduce for In-DB stream analytics [C]// OTM 2010: Proceedings of the 2010 International Conference on the Move to Meaningful Internet Systems. Berlin: Springer, 2010: 16-34. [13] YAN C, YANG X, YU Z, et al. IncMR: incremental data processing based on MapReduce [C]// CLOUD 2012: Proceedings of the 2012 IEEE 5th International Conference on Cloud Computing. Piscataway, NJ: IEEE, 2012: 534-541. [14] LOGOTHETIS D, YOCUM K. Ad-Hoc data processing in the cloud [J]. Proceedings of the VLDB endowment, 2008, 1(2): 1472-1475. [15] DE FRANCISCI MORALES G, GIONIS A, SOZIO M. Social content matching in MapReduce [J]. Proceedings of the VLDB endowment, 2011, 4(7): 460-469. [16] THUSOO A, SARMA J S, JAIN N, et al. Hive—a petabyte scale data warehouse using Hadoop [C]// ICDE 2010: Proceedings of the 2010 IEEE 26th International Conference on Data Engineering. Piscataway, NJ: IEEE, 2010: 996-1005. [17] PEND D, DABEK F. Large-scale incremental processing using distributed transactions and notifications [C]// OSDI'10: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2010: 1-15. [18] XIAO C, WANG W, LIN X, et al. Efficient similarity joins for near-duplicate detection [J]. ACM transactions on database systems, 2011, 36(3): 15. [19] HE Q, DU C, WANG Q, et al. A parallel incremental extreme SVM classifier [J]. Neurocomputing, 2011, 74(16): 2532-2540. [20] 李璐,王宏志,李建中,等.Ed-Sjoin:一种优化的字符串相似性连接算法[J].计算机研究与发展,2009,46(z2):319-325.(LI L, WANG H Z, LI J Z, et al. Ed-Sjoin: an optimal algorithm for similarity joins with edit distance constraints [J]. Journal of computer research and development, 2009, 46(z2): 319-325.) |