计算机应用 ›› 2019, Vol. 39 ›› Issue (3): 719-727.DOI: 10.11772/j.issn.1001-9081.2018081712

• 数据科学与技术 • 上一篇    下一篇

数据流频繁模式挖掘综述

韩萌, 丁剑   

  1. 北方民族大学 计算机科学与工程学院, 银川 750021
  • 收稿日期:2018-08-17 修回日期:2018-11-09 出版日期:2019-03-10 发布日期:2019-03-11
  • 作者简介:韩萌(1982-),女,河南商丘人,副教授,博士,CCF会员,主要研究方向:大数据分类、模式挖掘;丁剑(1977-),男,宁夏固原人,副教授,主要研究方向:大数据分类、模式挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61563001);宁夏自然科学基金资助项目(NZ17115)。

Survey of frequent pattern mining over data streams

HAN Meng, DING Jian   

  1. School of Computer Science and Engineering, North Minzu University, Yinchuan Ningxia 750021, China
  • Received:2018-08-17 Revised:2018-11-09 Online:2019-03-10 Published:2019-03-11
  • Contact: 韩萌
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61563001), the Natural Science Foundation of Ningxia(NZ17115).

摘要: 一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。

关键词: 数据流, 数据流挖掘, 频繁模式挖掘, 序列模式挖掘, 高效用模式挖掘

Abstract: Advanced applications such as fraud detection and trend learning lead to the development of frequent pattern mining over data streams. Data stream mining has to face more problems than static data mining like spatio-temporal constraint and combinatorial explosion of itemsets. In the paper, the existing frequent pattern mining algorithms over data streams were reviewed, and some classical algorithms and some newest algorithms were analyzed. According to the completeness of pattern set, frequent patterns of data stream could be divided into complete patterns and compressed patterns. Compressed patterns include closed frequent patterns, maximal frequent patterns, top-k frequent patterns and combinations of them. Between them, only closed frequent patterns are losslessly compressed. And constrained frequent pattern mining was used to narrow the result set obtained, satisfying the user's demand more. Algorithms based on sliding window model and time decay model were used to better handle recent transactions which occupy an important position in data stream mining. Moreover, two of the common algorithms, sequential pattern mining and high utility pattern mining algorithms were introduced. At last, further research direction of frequent pattern mining over data streams were discussed.

Key words: data stream, data stream mining, frequent pattern mining, sequential pattern mining, high utility pattern mining

中图分类号: