有效的子空间支配查询算法——Ranking-k

doi:10.11772/j.issn.1001-9081.2015.01.0108

计算机应用 ›› 2015, Vol. 35 ›› Issue (1): 108-114.DOI: 10.11772/j.issn.1001-9081.2015.01.0108

有效的子空间支配查询算法——Ranking-k

李秋生¹, 吴亚东¹, 林茂松², 王松^2,3, 王海洋¹, 冯鑫淼¹

1. 西南科技大学计算机科学与技术学院, 四川绵阳621010;
2. 西南科技大学信息工程学院, 四川绵阳621010;
3. 中国工程物理研究院电子工程研究所, 四川绵阳621010

收稿日期:2014-08-08 修回日期:2014-09-23 出版日期:2015-01-01 发布日期:2015-01-26
通讯作者: 吴亚东
作者简介:李秋生(1989-),男,山东菏泽人,硕士研究生,主要研究方向:科学可视化、信息可视化;吴亚东(1979-),男,河南周口人,教授,博士,CCF会员,主要研究方向:图像图形处理、可视化技术;林茂松(1964-),男,安徽全椒人,教授,博士,CCF会员,主要研究方向:图像图形处理、虚拟现实;王松(1989-),男,安徽桐城人,博士研究生,CCF会员,主要研究方向:基于内容的图像、视频检索及可视化;王海洋(1990-),男,河南驻马店人,主要研究方向:科学计算可视化;冯鑫淼(1991-),女,四川绵阳人,硕士研究生,主要研究方向:增强现实.
基金资助:
国家自然科学基金资助项目(61303127);国家科技支撑计划项目(2013BAH32F02,2013BAH32F03);国防重点学科实验室项目(13zxnk12);四川省教育厅重点项目(13ZA0169);四川省苗子工程资助项目(2014-043);西南科技大学研究生创新基金资助项目(14ycx057).

Ranking-k: effective subspace dominating query algorithm

LI Qiusheng¹, WU Yadong¹, LIN Maosong², WANG Song^2,3, WANG Haiyang¹, FENG Xinmiao¹

1. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
2. School of Information Engineering, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
3. Institute of Electronic Engineering, China Academy of Engineering Physics, Mianyang Sichuan 621010, China

Received:2014-08-08 Revised:2014-09-23 Online:2015-01-01 Published:2015-01-26

摘要/Abstract

摘要：

针对Top-k dominating查询算法需要较高的时空消耗来构建属性组合索引,并且在相同属性值较多情况下的查询结果准确率低等问题,提出一种通过B⁺-trees和概率分布模型相结合的子空间支配查询算法——Ranking-k算法.首先,采用B⁺-trees为待查找数据各属性构建有序列表;然后,采取轮询调度算法读取skyline准则涉及到的有序列表,生成候选元组并获得k组终结元组;其次,根据生成的候选元组和终结元组,采用概率分布模型计算终结元组支配分数.迭代上述过程优化查询结果,直到满足条件为止.实验结果表明:Ranking-k与基本扫描算法(BSA)相比,查询效率提高了94.43%;与差分算法(DA)相比,查询效率提高了7.63%;与早剪枝Top-k支配(TDEP)算法、BSA和DA相比,查询结果更接近理论值.

关键词: Top-k dominating, 子空间, Ranking-k算法, 有序列表, 轮询调度算法

Abstract:

Top-k dominating query algorithm requires high consumption of time and space to build combined indexes on the attributes, and the query accuracy is low for the data with same attribute values. To solve these problems, a Ranking-k algorithm was given in this paper. The proposed Ranking-k algorithm is a new subspace dominating query algorithm combining the B⁺-trees with probability distribution model. Firstly, the ordered lists for each data attribute were constructed by the B⁺-trees. Secondly, the round-robin scheduling algorithm was used to scan ordered attribute lists satisfying skyline criterion. Some candidate tuples were generated and k end tuples were obtained. Thirdly, the dominated scores of end tuples were calculated by using the probability distribution model according to the generated candidate tuples and end tuples. Through iterating the above process, the optimal query results were obtained. The experimental results show that the overall query efficiency of the proposed Ranking-k algorithm is improved by 94.43% compared with the Basic-Scan Algorithm (BSA) and by 7.63% compared with the Differential Algorithm (DA), and the query results of Ranking-k algorithm are much closer to theoretical values in comparison of the Top-k Dominating with Early Pruning (TDEP) algorithm, BSA and DA.

Key words: Top-k dominating, subspace, Ranking-k algorithm, sorted list, round-robin scheduling algorithm

中图分类号:

李秋生, 吴亚东, 林茂松, 王松, 王海洋, 冯鑫淼. 有效的子空间支配查询算法——Ranking-k[J]. 计算机应用, 2015, 35(1): 108-114.

LI Qiusheng, WU Yadong, LIN Maosong, WANG Song, WANG Haiyang, FENG Xinmiao. Ranking-k: effective subspace dominating query algorithm[J]. Journal of Computer Applications, 2015, 35(1): 108-114.

参考文献

[1] FAGIN R, LOTEM A, NAOR M. Optimal aggregation algorithms for middleware [C]// PODS'01: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York: ACM, 2001: 102-113.
[2] VAGELIS H, NICK K, PAPAKONSTANTINOU Y. PREFER: a system for the efficient execution of multi-parametric ranked queries [C]// Proceeding of the 2001 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2001: 259-270.
[3] BORZSONYI S, KOSSMANN D, STOCKER K. The skyline operator [C]// Proceedings of the 17th International Conference on Data Engineering. Piscataway: IEEE, 2001: 421-430.
[4] PAPADIAS D, TAO Y, FU G, et al. Progressive skyline computa- tion in database systems [J]. ACM Transactions on Database Systems, 2005, 30(1): 41-82.
[5] YIU M, MAMOULIS N. Efficient processing of Top-k dominating queries on multi-dimensional data [C]// VLDB'07: Proceedings of the 33rd International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2007: 483-494.
[6] YIU M, MAMOULIS N. Multi-dimensional Top-k dominating que-ries [J]. The International Journal on Very Large Data Bases, 2009, 18(3): 695-718.
[7] KONTAKI M, PAPADOPOULOS Y. Continuous Top-k dominating queries [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 24(5): 840-853.
[8] ZHANG W, LIN X, ZHANG Y, et al. Threshold-based probabilistic Top-k dominating queries [J]. The International Journal on Very Large Data Bases, 2010, 19(2): 283-305.
[9] TIAKAS E, PAPADOPOULOS A N, MANOLOPOULOS Y. Pro-gressive processing of subspace dominating queries [J]. The International Journal on Very Large Data Bases, 2011, 20(6): 921-948.
[10] HAN X, YANG D, LI J. An efficient Top-k dominating algorithm on massive data title [J]. Chinese Journal of Computers, 2013,33(8): 1405-1417.(韩希先,杨东华,李建中.TKEP:海量数据上一种有效的Top-k查询处理算法[J].计算机学报,2013,33(8): 1405-1417.)
[11] TAO Y, XIAO X, PEI J. SUBSKY: efficient computation of skylines in subspaces [C]// Proceedings of the 22nd International Conference on Data Engineering. Piscataway: IEEE, 2006: 65-76.
[12] XIA T, ZHANG D. Refreshing the sky: the compressed skycube with efficient support for frequent updates [C]// Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2006: 491-502.
[13] YUAN Y, LIN X, LIU Q, et al. Efficient computation of the skyline cube [C]// Proceedings of the 31st International Conference on Very Large Data Bases. [S.l.]: VLDB Endowment, 2005: 241-252.
[14] TAO Y, XIAO X, PEI J. Efficient skyline and Top-k retrieval in subspaces [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1072-1088.
[15] ZHANG Z, LU H, OOI C, et al. Understanding the meaning of a shifted sky: a general framework on extending skyline query [J]. The International Journal on Very Large Data Bases, 2010, 19(2): 181-201.

有效的子空间支配查询算法——Ranking-k

Ranking-k: effective subspace dominating query algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	谢雨, 蒋瑜, 龙超奇. 基于随机子空间的扩展隔离林算法[J]. 计算机应用, 2021, 41(6): 1679-1685.
[2]	林筠超, 万源. 基于图结构优化的自适应多度量非监督特征选择方法[J]. 计算机应用, 2021, 41(5): 1282-1289.
[3]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[4]	朱玉娜, 张玉涛, 闫少阁, 范钰丹, 陈韩托. 基于半监督子空间聚类的协议识别方法[J]. 计算机应用, 2021, 41(10): 2900-2904.
[5]	王丽娟, 陈少敏, 尹明, 许跃颖, 郝志峰, 蔡瑞初, 温雯. 基于近邻图改进的块对角子空间聚类算法[J]. 计算机应用, 2021, 41(1): 36-42.
[6]	黄学雨, 徐浩特, 陶剑文. 具有特征选择的多源自适应分类框架[J]. 计算机应用, 2020, 40(9): 2499-2506.
[7]	李杏峰, 黄玉清, 任珍文. 联合低秩稀疏的多核子空间聚类算法[J]. 计算机应用, 2020, 40(6): 1648-1653.
[8]	曾梦, 宁彬, 蔡之华, 谷琼. 使用深度对抗子空间聚类实现高光谱波段选择[J]. 计算机应用, 2020, 40(2): 381-385.
[9]	程玉胜, 宋帆, 王一宾, 钱坤. 基于专家特征的条件互信息多标记特征选择算法[J]. 计算机应用, 2020, 40(2): 503-509.
[10]	谭瑶, 饶文碧. 异构复合迁移学习的视频内容标注方法[J]. 计算机应用, 2018, 38(6): 1547-1553.
[11]	张乐园, 李佳烨, 李鹏清. 低秩约束的非线性属性选择算法[J]. 计算机应用, 2018, 38(12): 3444-3449.
[12]	原豪杰, 孙桂玲, 许依, 郑博文. 基于补集零空间与最近空间距离的人脸识别[J]. 计算机应用, 2017, 37(5): 1475-1480.
[13]	方宁, 周宇, 叶庆卫, 李玉刚. 基于无监督学习卷积神经网络的振动信号模态参数识别[J]. 计算机应用, 2017, 37(3): 786-790.
[14]	程铃钫, 杨天鹏, 陈黎飞. 不平衡数据的软子空间聚类算法[J]. 计算机应用, 2017, 37(10): 2952-2957.
[15]	吴杰祺, 李晓宇, 袁晓彤, 刘青山. 利用坐标下降实现并行稀疏子空间聚类[J]. 计算机应用, 2016, 36(2): 372-376.