Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (5): 1424-1429.DOI: 10.11772/j.issn.1001-9081.2017.05.1424

Previous Articles     Next Articles

Mining algorithm of maximal fuzzy frequent patterns

ZHANG Haiqing1, LI Daiwei1, LIU Yintian1, GONG Cheng1, YU Xi2   

  1. 1. College of Software Engineering, Chengdu University of Information Technology, Chengdu Sichuan 610225, China;
    2. College of Information Science and Engineering, Chengdu University, Chengdu Sichuan 610106, China
  • Received:2016-10-08 Revised:2016-12-23 Online:2017-05-10 Published:2017-05-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61602064, 61502059), the Scientific Research Foundation of Chengdu University of Information Technology (KYTZ201615).

最大模糊频繁模式挖掘算法

张海清1, 李代伟1, 刘胤田1, 龚程1, 于曦2   

  1. 1. 成都信息工程大学 软件工程学院, 成都 610225;
    2. 成都大学 信息科学与工程学院, 成都 610106
  • 通讯作者: 于曦
  • 作者简介:张海清(1986-),女,山东聊城人,讲师,博士研究生,主要研究方向:大数据分析;李代伟(1976-),男,四川达县人,副教授,硕士研究生,主要研究方向:数据集成与可视化、机器学习;刘胤田(1972-),男,四川隆昌人,教授,博士研究生,主要研究方向:数据挖掘;于曦(1973-)男,吉林长春人,副教授,博士研究生,主要研究方向:决策系统、神经网络。
  • 基金资助:
    国家自然科学基金青年基金资助项目(61602064,61502059);成都信息工程大学科研基金资助项目(KYTZ201615)。

Abstract: Combinatorial explosion and the effectiveness of mining results are the essential challenges of meaningful pattern extraction, a Maximal Fuzzy Frequent Pattern Tree Algorithm (MFFP-Tree) based on base-(second-order-effect) pattern structure and uncertainty consideration of items was proposed. Firstly, the fuzziness of items was analyzed comprehensively, the fuzzy support was given, and the fuzzy weight of items in the transaction data set was analyzed, the candidate item set was trimmed according to the fuzzy pruning strategy. Secondly, the database was scanned once to build FFP-Tree, and the overhead of pattern extraction was reduced based on fuzzy pruning strategy. The FFP-array structure was used to streamline the search method to further reduce the space and time complexity. The experimental results gained from the benchmark datasets reveal that the proposed MFFP-Tree has outstanding performance by comparing with PADS and FPMax* algorithms:the time complexity of the proposed algorithm is optimized by twice to one order of magnitude for different datasets, and the spatial complexity of the proposed algorithm is optimized by one order of magnitude to two orders of magnitude, respectively.

Key words: advanced pattern mining, maximum fuzzy pattern, fuzzy support, base-(second-order-effect) pattern structure, fuzzy pruning strategy

摘要: 针对有效模式挖掘的组合爆炸及挖掘结果信息如何有效表达的问题,提出了一种基于“核心-牵引”结构的修剪候选模式和考虑项目不确定性的最大模糊模式挖掘算法(MFFP-Tree)。首先,综合分析项目的模糊性,提出模糊支持度,分析项目在事务数据集中的模糊权重,依据模糊修剪策略修剪候选项集;其次,仅扫描数据集一次,就能成功构建模糊模式挖掘树,依据模糊剪枝策略减少模式提取的开销,采用FFP-array阵列结构使得搜索方式更精简,从而进一步降低时空开销。根据基准数据集的实验结果,与最大模式挖掘算法PADS和FPMax*对比分析,MFFP-Tree挖掘出的最大模糊模式能够更准确地反映项目与项目之间的关系;算法的时间复杂度能减半甚至低1个数量级;算法的空间复杂度降低1~2个数量级。

关键词: 高级模式挖掘, 最大模糊模式, 模糊支持度, 核心-牵引模式结构, 模糊修剪策略

CLC Number: