计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 322-328.DOI: 10.11772/j.issn.1001-9081.2017.02.0322

• 第33届中国数据库学术会议(NDBC 2016) • 上一篇    下一篇

领域驱动的高效用co-location模式挖掘方法

江万国, 王丽珍, 方圆, 陈红梅   

  1. 云南大学 信息学院, 昆明 650091
  • 收稿日期:2016-08-12 修回日期:2016-09-11 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 王丽珍,lzhwang2005@126.com
  • 作者简介:江万国(1990-),男,陕西汉中人,硕士研究生,主要研究方向:空间数据挖掘、知识发现;王丽珍(1962-),女,山东博兴人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、数据库;方圆(1990-),女,云南丽江人,博士研究生,主要研究方向:空间数据挖掘、知识发现;陈红梅(1976-),女,重庆人,副教授,博士,主要研究方向:数据挖掘、知识发现。
  • 基金资助:

    国家自然科学基金资助项目(61472346,61662086);云南省自然科学基金资助项目(2016FA026,2015FB114,2015FB149)。

Domain-driven high utility co-location pattern mining method

JIANG Wanguo, WANG Lizhen, FANG Yuan, CHEN Hongmei   

  1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650091, China
  • Received:2016-08-12 Revised:2016-09-11 Online:2017-02-10 Published:2017-02-11
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61472346, 61662086), the Natural Science Foundation of Yunnan Province (2016FA026, 2015FB114, 2015FB149).

摘要:

空间并置(co-location)模式是指其实例在空间邻域内频繁共现的空间特征集的子集。现有的空间co-location模式挖掘的有趣性度量指标,没有充分地考虑特征之间以及同一特征的不同实例之间的差异;另外,传统的基于数据驱动的空间co-location模式挖掘方法的结果常常包含大量无用或是用户不感兴趣的知识。针对上述问题,提出一种更为一般的研究对象——带效用值的空间实例,并定义了新的效用参与度(UPI)作为高效用co-location模式的有趣性度量指标;将领域知识形式化为三种语义规则并应用于挖掘过程中,提出一种领域驱动的多次迭代挖掘框架;最后通过大量实验对比分析不同有趣性度量指标下的挖掘结果在效用占比和频繁性两方面的差异,以及引入基于领域知识的语义规则前后挖掘结果的变化情况。实验结果表明所提出的UPI度量是一种兼顾频繁和效用的更为合理的度量指标;同时,领域驱动的挖掘方法能有效地挖掘到用户真正感兴趣的模式。

关键词: 空间模式挖掘, co-location模式, 高效用co-location模式, 有趣性度量指标, 领域驱动, 语义规则

Abstract:

A spatial co-location pattern represents a subset of spatial features whose instances are frequently located together in spatial neighborhoods. The existing interesting metrics for spatial co-location pattern mining do not take account of the difference between features and the diversity between instances belonging to the same feature. In addition, using the traditional data-driven spatial co-location pattern mining method, the mining results often contain a lot of useless or uninteresting patterns. In view of the above problems, firstly, a more general study object-spatial instance with utility value was proposed, and the Utility Participation Index (UPI) was defined as the new interesting metric of the spatial high utility co-location patterns. Secondly, the domain knowledge was formalized into three kinds of semantic rules and applied to the mining process, and a new domain-driven iterative mining framework was put forward. Finally, by the extensive experiments, the differences between mined results with different interesting metrics were compared in two aspects of utility ratio and frequency, as well as the changes of the mining results after taking the domain knowledge into account. Experimental results show that the proposed UPI metric is a more reasonable measure in consideration of both frequency and utility, and the domain-driven mining method can effectively find the co-location patterns that users are really interested in.

Key words: spatial pattern mining, co-location pattern, high utility co-location pattern, interesting metric, domain-driven, semantic rule

中图分类号: