计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 165-170.DOI: 10.11772/j.issn.1001-9081.2017061582

• 数据科学与技术 • 上一篇    下一篇

面向不确定数据模式指标的通用界值估算方法

王菊1, 刘付显1, 靳春杰2   

  1. 1. 空军工程大学 防空反导学院, 西安 710051;
    2. 93527部队, 河北 张家口 075000
  • 收稿日期:2017-06-26 修回日期:2017-08-21 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 王菊
  • 作者简介:王菊(1991-),女,内蒙古临河人,博士研究生,主要研究方向:数据挖掘、模式识别;刘付显(1962-),男,山东菏泽人,教授,博士,主要研究方向:作战仿真、数据挖掘;靳春杰(1989-),男,天津北辰人,助理工程师,主要研究方向:数据挖掘、模式识别。

General bound estimation method for pattern measures over uncertain datasets

WANG Ju1, LIU Fuxian1, JIN Chunjie2   

  1. 1. College of Air and Missile Defense, Air Force Engineering University, Xi'an Shaanxi 710051, China;
    2. Unit 93527, Zhangjiakou Hebei 075000, China
  • Received:2017-06-26 Revised:2017-08-21 Online:2018-01-10 Published:2018-01-22

摘要: 针对约束模式挖掘中模式指标的界值估算问题,提出了一种面向不确定数据模式指标的通用界值估算方法。根据带有权值的不确定型事务数据库的特点,首先设计了面向常用模式指标的通用界值估算框架,其次给出了在该框架下对模式指标上界值的快速估算方法,最后估计了两种典型模式指标的上界值以说明其可行性。实验中对比了PHUI-UP算法分别结合事务加权效用值、所提方法估算所得的上界值和实际上界值后的运行时间和内存占用情况,实验结果表明所提方法可以通过占用较小内存和运行时间来实现模式效用上界值的估算。

关键词: 不确定数据库, 模式指标, 界值估算, 约束模式挖掘, 通用估算框架

Abstract: Concerning the problem of bound estimation for pattern measures in constraint-based pattern mining, a general bound estimation method for pattern measures over uncertain datasets was proposed. According to the characteristics of uncertain transaction datasets with weight, firstly, a general estimation framework for common pattern measures was designed. Secondly, a fast estimation method for the upper bound of pattern measures under the designed framework was provided. Finally, two commonly used pattern measures were introduced to verify the feasibility of the proposed method. In the experiment, the runtime and memory usage of the Potential High-Utility Itemsets UPper-bound-based mining (PHUI-UP) algorithm with transaction weighted utilization, the proposed upper bound and the actual upper bound were compared. The experimental results show that the proposed method can take less memory usage and runtime to realize the estimation of the upper bound of pattern utilization.

Key words: uncertain dataset, pattern measure, bound estimation, constraint-based pattern mining, general estimation framework

中图分类号: