Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (8): 2386-2395.DOI: 10.11772/j.issn.1001-9081.2020101561

Special Issue: 第八届CCF大数据学术会议(CCF Bigdata 2020)

• CCF Bigdata 2020 • Previous Articles     Next Articles

Algorithm for mining top-k high utility itemsets with negative items

SUN Rui, HAN Meng, ZHANG Chunyan, SHEN Mingyao, DU Shiyu   

  1. School of Computer Science and Engineering, North Minzu University, Yinchuan Ningxia 750021, China
  • Received:2020-10-10 Revised:2020-12-07 Online:2021-08-10 Published:2020-12-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (62062004), the Ningxia Natural Science Foundation (2020AAC03216), the Computer Application Technology Key Discipline Project of the Autonomous Region (PY1902), the First-Class Discipline Construction of Ningxia Colleges and Universities (Electronic Science and Technology Discipline) (NXYKXY2017A07).

含负项top-k高效用项集挖掘算法

孙蕊, 韩萌, 张春砚, 申明尧, 杜诗语   

  1. 北方民族大学 计算机科学与工程学院, 银川 750021
  • 通讯作者: 韩萌
  • 作者简介:孙蕊(1993-),女,山东邹城人,硕士研究生,主要研究方向:模式挖掘;韩萌(1982-),女(回族),河南商丘人,副教授,博士,CCF会员,主要研究方向:大数据分类、模式挖掘;张春砚(1995-),女,河北张家口人,硕士研究生,主要研究方向:模式挖掘;申明尧(1994-),男,山东菏泽人,硕士研究生,主要研究方向:大数据分类;杜诗语(1996-),女,辽宁抚顺人,硕士研究生,主要研究方向:大数据分类。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216);计算机应用技术自治区重点学科项目(PY1902);宁夏高等学校一流学科建设项目(电子科学与技术学科)(NXYKXY2017A07)。

Abstract: Mininng High Utility Itemsets (HUI) with negative items is one of the emerging itemsets mining tasks. In order to mine the result set of HUI with negative items meeting the user needs, a Top-k High utility itemsets with Negative items (THN) mining algorithm was proposed. In order to improve the temporal and spatial performance of the THN algorithm, a strategy to automatically increase the minimum utility threshold was proposed, and the pattern growth method was used for depth-first search; the search space was pruned by using the redefined subtree utility and the redefined local utility; the transaction merging technology and dataset projection technology were employed to solve the problem of scanning the database for multiple times; in order to increase the utility counting speed, the utility array counting technology was used to calculate the utility of the itemset. Experimental results show that the memory usage of THN algorithm is about 1/60 of that of the HUINIV (High Utility Itemsets with Negative Item Values)-Mine algorithm, and is about 1/2 of that of the FHN (Faster High utility itemset miner with Negative unit profits) algorithm; the THN algorithm takes 1/10 runtime of that of the FHN algorithm; and the THN algorithm achieves better performance on dense datasets.

Key words: itemset mining, high utility itemset, top-k itemset, negative item, positive item

摘要: 含负项高效用项集(HUI)挖掘是新兴的数据挖掘任务之一。为了挖掘满足用户需求的含负项HUI结果集,提出了含负项top-k高效用项集(THN)挖掘算法。为了提升THN算法的时空性能,提出了自动提升最小效用阈值的策略,并采用模式增长方法进行深度优先搜索;使用重新定义的子树效用和重新定义的本地效用修剪搜索空间;使用事务合并技术和数据集投影技术解决多次扫描数据库的问题;为了提高效用计数的速度,使用效用数组计数技术计算项集的效用。实验结果表明,THN算法的内存消耗约为HUINIV-Mine算法的1/60,约为FHN算法的1/2;THN算法的执行时间是FHN算法的1/10;而且该算法在密集数据集上的性能更好。

关键词: 项集挖掘, 高效用项集, top-k项集, 负项, 正项

CLC Number: