《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2049-2056.DOI: 10.11772/j.issn.1001-9081.2022091333

• 第39届CCF中国数据库学术会议(NDBC 2022) • 上一篇    

增量数据上的闭合定量高效用项集挖掘算法

单芝慧, 韩萌(), 韩强   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 收稿日期:2022-09-08 修回日期:2022-10-24 接受日期:2022-11-04 发布日期:2023-07-20 出版日期:2023-07-10
  • 通讯作者: 韩萌
  • 作者简介:单芝慧(1996—),女,河南周口人,硕士研究生,CCF会员,主要研究方向:模式挖掘;
    韩萌(1982—),女,河南商丘人,教授,博士,CCF会员,主要研究方向:数据挖掘;
    韩强(1973—),男,黑龙江阿城人,教授,博士,CCF会员,主要研究方向:工作流技术、可信软件。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216)

Closed high utility quantitative itemset mining algorithm on incremental data

Zhihui SHAN, Meng HAN(), Qiang HAN   

  1. School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China
  • Received:2022-09-08 Revised:2022-10-24 Accepted:2022-11-04 Online:2023-07-20 Published:2023-07-10
  • Contact: Meng HAN
  • About author:SHAN Zhihui, born in 1996, M. S. candidate. Her research interests include pattern mining.
    HAN Meng, born in 1982, Ph. D., professor. Her research interests include data mining.
    HAN Qiang, born in 1973, Ph. D., professor. His research interests include workflow technology, trusted software.
  • Supported by:
    National Natural Science Foundation of China(62062004);Natural Science Foundation of Ningxia(2020AAC03216)

摘要:

高效用项集(HUI)挖掘能够提供数据集中高利润的项的组合信息,有利于在现实应用中制定有效的营销策略。然而,HUI仅提供项集及其总效用,不提供单个项的购买数量,而现实场景中项的数量能提供更精准的信息。因此,研究者提出定量高效用项集(HUQI)挖掘算法。针对当前的HUQI挖掘算法仅能处理静态数据且存在结果集冗余的问题,提出增量更新的定量效用列表结构来存储并更新数据集中项的效用信息,并基于该结构提出一种挖掘闭合定量高效用项集(CHUQI)的算法。将所提出的算法与FHUQI-Miner (Faster High Utility Quantitative Itemset Miner)算法在结果集数量、最小效用阈值、批次数目以及可扩展性上对比时间与内存消耗。实验结果表明,所提算法能够有效处理增量数据,挖掘出更有趣的项集。

关键词: 增量挖掘, 高效用项集, 定量高效用项集, 闭合高效用项集, 效用列表

Abstract:

High Utility Itemset (HUI) mining can provide information about the combination of highly profitable items in a dataset, which is useful for developing effective marketing strategies in real-world applications. However, HUIs only provide the itemsets and their total utility, not the purchased numbers of individual items, and the numbers of items in a real scenarios provide more precise accurate information. Therefore, High Utility Quantitative Itemset (HUQI) mining algorithms have been proposed by researchers. Focusing on the issue that the current HUQI mining algorithms can only process static data and have the problem of redundant resultsets, an incrementally updated quantitative utility list structure was proposed for storing and updating the utility information of items in the dataset, and based on this structure, an algorithm for mining Closed High Utility Quantitative Itemset (CHUQI) was proposed. The time and memory consumption of the proposed algorithm was compared with that of Faster High Utility Quantitative Itemset Miner (FHUQI-Miner) algorithm in terms of the number of result sets, minimum utility threshold, number of batches, and scalability. Experimental results show that the proposed algorithm can process incremental data effectively and mine more interesting itemsets.

Key words: incremental mining, High Utility Itemset (HUI), High Utility Quantitative Itemset (HUQI), Closed High Utility Itemset (CHUI), utility list

中图分类号: