《计算机应用》唯一官方网站 ›› 2020, Vol. 40 ›› Issue (2): 465-472.DOI: 10.11772/j.issn.1001-9081.2019081900
• 第36届CCF中国数据库学术会议(NDBC 2019) • 上一篇 下一篇
收稿日期:
2019-08-12
修回日期:
2019-11-06
接受日期:
2019-11-08
发布日期:
2019-11-18
出版日期:
2020-02-10
通讯作者:
陈红梅
作者简介:
马董(1992—),男,云南曲靖人,硕士研究生,主要研究方向:空间数据挖掘基金资助:
Dong MA, Hongmei CHEN(), Lizhen WANG, Qing XIAO
Received:
2019-08-12
Revised:
2019-11-06
Accepted:
2019-11-08
Online:
2019-11-18
Published:
2020-02-10
Contact:
Hongmei CHEN
About author:
MA Dong, born in 1992, M. S. candidate. His research interests include spatial data mining.Supported by:
摘要:
空间co-location模式是一组空间特征的子集,它们的实例在邻域内频繁并置出现。通常,空间co-location模式挖掘方法假设空间实例相互独立,并采用空间实例参与到模式实例的频繁性(参与率)来度量空间特征在模式中的重要性,采用空间特征的最小参与率(参与度)来度量模式的有趣程度,忽略了空间特征间的某些重要关系。因此为了揭示空间特征间的主导关系而提出主导特征co-location模式。现有主导特征模式挖掘方法是基于传统频繁模式及其团实例模型进行挖掘,然而,团实例模型可能会忽略非团的空间特征间的主导关系。因此,基于星型实例模型,研究空间亚频繁co-location模式的主导特征挖掘,以更好地揭示空间特征间的主导关系,挖掘更有价值的主导特征模式。首先,定义了两个度量特征主导性的指标;其次,设计了有效的主导特征co-location模式挖掘算法;最后,在合成数据集和真实数据集上通过大量实验验证了所提算法的有效性以及主导特征模式的实用性。
中图分类号:
马董, 陈红梅, 王丽珍, 肖清. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2): 465-472.
Dong MA, Hongmei CHEN, Lizhen WANG, Qing XIAO. Dominant feature mining of spatial sub-prevalent co-location patterns[J]. Journal of Computer Applications, 2020, 40(2): 465-472.
特征 | 星型参与实例 | 星型行实例 |
---|---|---|
D | D.2 | {D.2#,F.5,H.1} |
D.3 | {D.3#,F.6,H.1} | |
D.4 | {D.4#,F.1,H.2} | |
F | F.1 | {D.4,F.1#,H.2} |
F.5 | {D.2,F.5#,H.1} | |
F.6 | {D.3,F.6#,H.1} | |
H | H.1 | {D.2,F.5,H.1#} {D.2,F.6,H.1#} {D.3,F.5,H.1#} {D.3,F.6,H.1#} |
H.2 | {D.1,F.1,H.2#} {D.4,F.1,H.2#} | |
H.3 | {D.6,F.2,H.3#} {D.6,F.3,H.3#} {D.7,F.2,H.3#} {D.7,F.3,H.3#} | |
H.4 | {D.5,F.4,H.4#} |
表1 模式{D,F,H}的星型行实例
Tab.1 Star row instances of pattern {D,F,H}
特征 | 星型参与实例 | 星型行实例 |
---|---|---|
D | D.2 | {D.2#,F.5,H.1} |
D.3 | {D.3#,F.6,H.1} | |
D.4 | {D.4#,F.1,H.2} | |
F | F.1 | {D.4,F.1#,H.2} |
F.5 | {D.2,F.5#,H.1} | |
F.6 | {D.3,F.6#,H.1} | |
H | H.1 | {D.2,F.5,H.1#} {D.2,F.6,H.1#} {D.3,F.5,H.1#} {D.3,F.6,H.1#} |
H.2 | {D.1,F.1,H.2#} {D.4,F.1,H.2#} | |
H.3 | {D.6,F.2,H.3#} {D.6,F.3,H.3#} {D.7,F.2,H.3#} {D.7,F.3,H.3#} | |
H.4 | {D.5,F.4,H.4#} |
特征 | 贡献度(FCR) | 影响度(FIR) | 影响比指数(FIQI) |
---|---|---|---|
D | 0.18 | 0.43 | 0.00 |
F | 0.18 | 0.50 | 0.14 |
H | 0.65 | 1.75 | 0.75 |
表2 {D,F,H}中的特征指标
Tab.2 Feature indicators of {D,F,H}
特征 | 贡献度(FCR) | 影响度(FIR) | 影响比指数(FIQI) |
---|---|---|---|
D | 0.18 | 0.43 | 0.00 |
F | 0.18 | 0.50 | 0.14 |
H | 0.65 | 1.75 | 0.75 |
数据集 | 特征数 | 实例数 | 范围(D×D) |
---|---|---|---|
Plantdata | 31 | 356 | 8 000×13 000 |
Beijing-POI | 16 | 23 025 | 22 000×14 000 |
Synthetic data 1 | 10 | 10 000 | 500×500 |
Synthetic data 2 | 10 | 10 000 | 1 000×1 000 |
Synthetic data 3 | 25 | 50 000 | 1 000×1 000 |
表3 实验数据集统计信息
Tab. 3 Experimental data set statistics
数据集 | 特征数 | 实例数 | 范围(D×D) |
---|---|---|---|
Plantdata | 31 | 356 | 8 000×13 000 |
Beijing-POI | 16 | 23 025 | 22 000×14 000 |
Synthetic data 1 | 10 | 10 000 | 500×500 |
Synthetic data 2 | 10 | 10 000 | 1 000×1 000 |
Synthetic data 3 | 25 | 50 000 | 1 000×1 000 |
数据集 | 距离阈值(d) | 星型参与度阈值(min_sprev) | 贡献度阈值(min_ fcr) | 影响比指数阈值(min_ fiqi) |
---|---|---|---|---|
Plantdata | 5 000 | 0.3 | 0.3 | 0.3 |
Beijing-POI | 50 | 0.3 | 0.3 | 0.3 |
Synthetic data 1 | 20 | 0.3 | 0.3 | 0.3 |
Synthetic data 2 | 20 | 0.3 | 0.3 | 0.3 |
Synthetic data 3 | 20 | 0.3 | 0.3 | 0.3 |
表4 SDFMA的实验参数默认值
Tab. 4 Default values of experimental parameters of SDFMA algorithm
数据集 | 距离阈值(d) | 星型参与度阈值(min_sprev) | 贡献度阈值(min_ fcr) | 影响比指数阈值(min_ fiqi) |
---|---|---|---|---|
Plantdata | 5 000 | 0.3 | 0.3 | 0.3 |
Beijing-POI | 50 | 0.3 | 0.3 | 0.3 |
Synthetic data 1 | 20 | 0.3 | 0.3 | 0.3 |
Synthetic data 2 | 20 | 0.3 | 0.3 | 0.3 |
Synthetic data 3 | 20 | 0.3 | 0.3 | 0.3 |
模式 | Plantdata数据集 | Beijing-POI数据集 | |||
---|---|---|---|---|---|
SDFMA | AMDFCP | SDFMA | AMDFCP | ||
二阶 模式 | {松茸,长苞冷杉*} {天女花,水清树*} {冬虫夏草,梭砂贝母*} | 无 | {中餐馆,酒店*} {咖啡馆,花园*} {酒店*,停车场} | 无 | |
三阶 模式 | {高河菜,冬虫夏草,梭砂贝母*} {冬虫夏草,梭砂贝母*,长苞冷杉*} {冬虫夏草*,梭砂贝母*,天女花} {云南榧木*,云南红豆杉,贡山三尖杉*} | {高河菜*,冬虫夏草,梭砂贝母} {冬虫夏草*,梭砂贝母*,长苞冷杉} {冬虫夏草*,梭砂贝母,天女花} | {中餐馆,酒店*,服装店*} {酒店*,停车场,服装店*} {中餐馆,咖啡屋*,招待所*} | {中餐馆*,酒店*,服装店} {酒店*,停车场*,服装店} |
表5 SDFMA和AMDFCP在不同数据上的挖掘结果对比
Tab. 5 Mining result comparison of SDFMA and AMDFCP on different datasets
模式 | Plantdata数据集 | Beijing-POI数据集 | |||
---|---|---|---|---|---|
SDFMA | AMDFCP | SDFMA | AMDFCP | ||
二阶 模式 | {松茸,长苞冷杉*} {天女花,水清树*} {冬虫夏草,梭砂贝母*} | 无 | {中餐馆,酒店*} {咖啡馆,花园*} {酒店*,停车场} | 无 | |
三阶 模式 | {高河菜,冬虫夏草,梭砂贝母*} {冬虫夏草,梭砂贝母*,长苞冷杉*} {冬虫夏草*,梭砂贝母*,天女花} {云南榧木*,云南红豆杉,贡山三尖杉*} | {高河菜*,冬虫夏草,梭砂贝母} {冬虫夏草*,梭砂贝母*,长苞冷杉} {冬虫夏草*,梭砂贝母,天女花} | {中餐馆,酒店*,服装店*} {酒店*,停车场,服装店*} {中餐馆,咖啡屋*,招待所*} | {中餐馆*,酒店*,服装店} {酒店*,停车场*,服装店} |
1 | AKBARI M, SAMADZADEGAN F, WEIBEL R. A generic regional spatio-temporal co-occurrence pattern mining model: a case study for air pollution[J]. Journal of Geographical Systems, 2015, 17(3): 249-274. 10.1007/s10109-015-0216-4 |
2 | YU W, AI T, HE Y, et al. Spatial co-location pattern mining of facility points-of-interest improved by network neighborhood and distance decay effects[J]. International Journal of Geographical Information Science, 2017, 31(2): 280-296. 10.1080/13658816.2016.1194423 |
3 | AN S, YANG H, WANG J, et al. Mining urban recurrent congestion evolution patterns from GPS equipped vehicle mobility data[J]. Information Sciences, 2016, 373: 515-526. 10.1016/j.ins.2016.06.033 |
4 | 王丽珍,陈红梅. 空间模式挖掘理论与方法[M]. 北京:科学出版社, 2014:2-4. (WANG L Z, CHEN H M, Spatial Pattern Mining Theory and Methods[M]. Beijing: Science Press, 2014:2-4.) |
5 | FANG Y, WANG L, WANG X, et al. Mining co-location patterns with dominant features[C]// Proceedings of the 2017 International Conference on Web Information Systems Engineering, LNCS10569. Cham: Springer, 2017: 183-198. |
6 | WANG L, BAO X, ZHOU L, et al. Maximal sub-prevalent co-location patterns and efficient mining algorithms[C]// Proceedings of the 2017 International Conference on Web Information Systems Engineering, LNCS10569. Cham: Springer, 2017: 199-214. |
7 | WANG L, BAO X, ZHOU L, et al. Mining maximal sub-prevalent co-location patterns[J]. World Wide Web, 2019, 22(5): 1971-1997. 10.1007/s11280-018-0646-2 |
8 | HUANG Y, SHEKHAR S, XIONG H. Discovering colocation patterns from spatial data sets: a general approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004,16(12): 1472-1485. 10.1109/tkde.2004.90 |
9 | YOO J S, SHEKHAR S, SMITH J, et al. A partial join approach for mining co-location patterns[C]// Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems. New York: ACM, 2004: 241-249. 10.1145/1032222.1032258 |
10 | YOO J S, SHEKHAR S, CELIK M. A join-less approach for co-location pattern mining: a summary of results[C]// Proceedings of the 5th IEEE International Conference on Data Mining. Piscataway: IEEE, 2005: 813-816. 10.1109/icdm.2005.8 |
11 | WANG L, BAO Y, LU J, et al. A new join-less approach for co-location pattern mining[C]// Proceedings of the 8th IEEE International Conference on Computer and Information Technology. Piscataway: IEEE, 2008: 197-202. 10.1109/cit.2008.4594673 |
12 | WANG L, BAO Y, LU Z. Efficient discovery of spatial co-location patterns using the iCPI-tree[J]. The Open Information Systems Journal, 2009, 3 :69-80. 10.2174/1874133900903020069 |
13 | WANG L, ZHOU L, LU J, et al. An order-clique-based approach for mining maximal co-locations[J]. Information Sciences, 2009, 179(19): 3370-3382. 10.1016/j.ins.2009.05.023 |
14 | WANG L, CHEN H, ZHAO L, et al. Efficiently mining co-location rules on interval data[C]// Proceedings of the 2010 International Conference on Advanced Data Mining and Applications, LNCS6440. Berlin: Springer, 2010: 477-488. |
15 | 陆叶,王丽珍,张晓峰. 从不确定数据集中挖掘频繁co-location模式[J]. 计算机科学与探索, 2009, 3(6): 656-664. 10.3778/j.issn.1673-9418.2009.06.011 |
LU Y, WANG L Z, ZHANG X F. Mining frequent co-location patterns from uncertain data[J]. Journal of Frontiers of Computer Science and Technology, 2009, 3(6): 656-664. 10.3778/j.issn.1673-9418.2009.06.011 | |
16 | WANG L, WU P, CHEN H. Finding probabilistic prevalent colocations in spatially uncertain data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 790-804. 10.1109/tkde.2011.256 |
17 | HUANG Y, PEI J, XIONG H. Mining co-location patterns with rare events from spatial data sets[J]. GeoInformatica, 2006, 10(3): 239-260. 10.1007/s10707-006-9827-8 |
18 | 冯岭,王丽珍,高世健. 一种带稀有特征的空间co-location模式挖掘新方法[J]. 南京大学学报(自然科学版), 2012, 48(1):99-107. |
FENG L, WANG L Z, GAO S J. A new approach of mining co-location patterns in spatial datasets with rare features[J]. Journal of Nanjing University (Natural Sciences), 2012, 48(1): 99-107. | |
19 | 欧阳志平,王丽珍,陈红梅. 模糊对象的空间co-location模式挖掘研究[J]. 计算机学报, 2011, 34(10):1947-1955. 10.3724/SP.J.1016.2011.01947 |
OUYANG Z P, WANG L Z, CHEN H M. Mining spatial co-location patterns for fuzzy objects[J]. Chinese Journal of Computers, 2011, 34(10): 1947-1955. 10.3724/SP.J.1016.2011.01947 | |
20 | FANG Y, WANG L, HU T. Spatial co-location pattern mining based on density peaks clustering and fuzzy theory[C]// Proceedings of the 2018 Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, LNCS10988. Cham: Springer, 2018: 298-305. |
21 | 雷乐,王丽珍,肖清. 空间co-location模式挖掘中的模糊技术初探[J]. 计算机工程与应用, 2019, 55(21): 158-166. 10.1109/icbk.2019.00025 |
LEI L, WANG L Z, XIAO Q. Study on fuzzy mining technology in spatial co-location pattern mining[J]. Computer Engineering and Applications , 2019, 55(21): 158-166. 10.1109/icbk.2019.00025 | |
22 | WANG X, WANG L, LU J, et al. Effectively updating high utility co-location from spatial database[C]// Proceedings of the 17th Web-Age Information Management, LNCS9658. Cham: Springer, 2016: 67-81. |
23 | FLOUVAT F, VAN SOC J F N, DESMIER E, et al. Domain-driven co-location mining[J]. GeoInformatica, 2015, 19(1): 147-183. 10.1007/s10707-014-0209-3 |
24 | ZENG X, YANG J, LI Z, et al. A method of mining spatial high utility co-location patterns based on feature actual participation weight[J]. Journal of Physics: Conference Series, 2019, 1168(3): No. 032064. 10.1088/1742-6596/1168/3/032064 |
[1] | 袁书寒 陈维斌 傅顺开. 位置服务社交网络用户行为相似性分析[J]. 计算机应用, 2012, 32(02): 322-325. |
[2] | 王生生;刘大有;曹斌;刘杰. 一种高维空间数据的子空间聚类算法[J]. 计算机应用, 2005, 25(11): 2615-2617. |
[3] | 张永梅,韩焱,张建华. 一种有效聚类算法的研究和实现[J]. 计算机应用, 2005, 25(07): 1573-1576. |
[4] | 孙志伟,赵政. DBSCAN在非空间属性处理上的扩展[J]. 计算机应用, 2005, 25(06): 1379-1381. |
[5] | 涂建东,陈崇成,黄洪宇,张群洪. 基于J2EE的空间数据挖掘系统设计与实现[J]. 计算机应用, 2005, 25(03): 710-712. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||