Survey of high utility pattern mining methods based on positive and negative utility division

doi:10.11772/j.issn.1001-9081.2021071268

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (4): 999-1010.DOI: 10.11772/j.issn.1001-9081.2021071268

Special Issue: 综述； CCF第36届中国计算机应用大会 (CCF NCCA 2021)

• The 36 CCF National Conference of Computer Applications (CCF NCCA 2020) • Previous Articles Next Articles

Survey of high utility pattern mining methods based on positive and negative utility division

Ni ZHANG, Meng HAN(), Le WANG, Xiaojuan LI, Haodong CHENG

School of Computer Science and Engineering，North Minzu University，Yinchuan Ningxia 750021，China

Received:2021-07-16 Revised:2021-08-13 Accepted:2021-08-19 Online:2021-08-13 Published:2022-04-10
Contact: Meng HAN
About author:ZHANG Ni， born in 1996， M. S. candidate. Her research interests include high utility pattern mining.
WANG Le， born in 1994， M. S. candidate. Her research interests include data stream integration classification.
LI Xiaojuan， born in 1994， M. S. candidate. Her research interests include data stream integration classification.
CHENG Haodong， born in 1996， M. S. candidate. His research interests include high utility pattern mining.
Supported by:
National Natural Science Foundation of China(62062004);Ningxia Natural Science Foundation(2020AAC03216);Postgraduate Innovation Project of North Minzu University(YCX21082)

基于正负效用划分的高效用模式挖掘方法综述

张妮, 韩萌(), 王乐, 李小娟, 程浩东

北方民族大学计算机科学与工程学院，银川 750021

通讯作者: 韩萌
作者简介:张妮（1996—），女，山西长治人，硕士研究生，CCF会员，主要研究方向：高效用模式挖掘
王乐（1994—），女，吉林白城人，硕士研究生，CCF会员，主要研究方向：数据流集成分类
李小娟（1994—），女，宁夏吴忠人，硕士研究生，CCF会员，主要研究方向：数据流集成分类
程浩东（1996—），男，山东泰安人，硕士研究生，CCF会员，主要研究方向：高效用模式挖掘。
基金资助:
国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216);北方民族大学研究生创新项目(YCX21082)

Abstract

Abstract:

High Utility Pattern Mining （HUPM） is one of the emerging data science research contents. The unit profit and number of items in the transaction database are considered to extract more useful information. The utility value of each item is assumed to be positive by the traditional HUPM methods， but in practical applications， the utility values of some data items may be negative （for example， the profit value of the product is negative due to a loss）， and the pattern mining with negative items is as important as the pattern mining with only positive terms. Firstly， the relevant concepts of HUPM were explained， and the examples of corresponding positive and negative utilities were given. Then， the HUPM methods were divided into positive and negative perspectives， among which the pattern mining methods with positive utility were further divided into dynamic and static database perspectives； the pattern mining methods with negative utility included priori-based， tree-based， utility list-based， and array-based key technologies. the HUPM methods were discussed and summarized from different aspects. Finally， the shortcomings of the existing HUPM methods and the next research directions were given.

Key words: pattern mining, high utility pattern, positive utility, negative utility, static data, dynamic data

摘要：

高效用模式挖掘（HUPM）是新兴的数据科学研究内容之一，通过考虑事务数据库中项的单位利润和数量，以提取出更有用的信息。传统的HUPM方法假定所有项的效用值均为正，但是在实际应用中，某些数据项的效用值可能为负（如商品因产生亏损而导致利润值为负），含负项的模式挖掘与仅含正项的模式挖掘同样重要。首先，阐述了HUPM的相关概念，并分别给出相应正负效用的实例；然后，以正与负角度划分了HUPM方法，其中带有正效用的模式挖掘方法进一步以动态与静态的数据库新颖角度划分，带有负效用的模式挖掘方法中包括了基于先验、基于树、基于效用列表和基于数组等关键技术，并从不同方面对这些方法进行了讨论和总结；最后，给出了现有HUPM方法的不足和下一步研究方向。

关键词: 模式挖掘, 高效用模式, 正效用, 负效用, 静态数据, 动态数据

CLC Number:

TP311

Ni ZHANG, Meng HAN, Le WANG, Xiaojuan LI, Haodong CHENG. Survey of high utility pattern mining methods based on positive and negative utility division[J]. Journal of Computer Applications, 2022, 42(4): 999-1010.

张妮, 韩萌, 王乐, 李小娟, 程浩东. 基于正负效用划分的高效用模式挖掘方法综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 999-1010.

Figures/Tables 6

References 59

1	AGRAWAL R， SRIKANT R. Fast algorithms for mining association rules ［C］// Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco， CA： Morgan Kaufmann Publishers Inc.， 1994： 487-499.
2	HAN J， PEI J， YIN Y， et al. Mining frequent patterns without candidate generation： a frequent-pattern tree approach ［J］. Data Mining and Knowledge Discovery， 2004， 8： 53-87. 10.1023/b:dami.0000005258.31418.83
3	TSENG V S， SHIE B E， WU C W， et al. Efficient algorithms for mining high utility itemsets from transactional databases［J］. IEEE Transactions on Knowledge and Data Engineering. 2013， 25（8）： 1772-1786. 10.1109/tkde.2012.59
4	LIN C W， HONG T P， LU W H. An effective tree structure for mining high utility itemsets［J］. Expert Systems with Applications. 2011， 38（6）： 7419-7424. 10.1016/j.eswa.2010.12.082
5	AHMED C F， TANBEER S K， JEONG B S， et al. Efficient tree structures for high utility pattern mining in incremental databases［J］. IEEE Transactions on Knowledge and Data Engineering. 2009， 21（12）： 1708-1721. 10.1109/tkde.2009.46
6	YUN U， RYANG H. Incremental high utility pattern mining with static and dynamic databases［J］. Applied Intelligence. 2015， 42（2）： 323-352. 10.1007/s10489-014-0601-6
7	LIN J C W， GAN W S， HONG T P， et al. An incremental high-utility mining algorithm with transaction insertion［J］. The Scientific World Journal， 2015， 2015：161564.1-161564.15. 10.1155/2015/161564
8	YUN U， RYANG H， LEE G， et al. An efficient algorithm for mining high utility patterns from incremental databases with one database scan［J］. Knowledge-Based Systems， 2017， 124（15）： 188-206. 10.1016/j.knosys.2017.03.016
9	YUN U， NAM H， LEE G， et al. Efficient approach for incremental high utility pattern mining with indexed list structure［J］. Future Generation Computer Systems， 2019， 95： 221-239. 10.1016/j.future.2018.12.029
10	NGUYEN L T T， NGUYEN P， NGUYEN T D D， et al. Mining high-utility itemsets in dynamic profit databases［J］. Knowledge-Based Systems， 2019， 175： 130-144. 10.1016/j.knosys.2019.03.022
11	CHU C J， TSENG V S， LIANG T. An efficient algorithm for mining high utility itemsets with negative item values in large databases［J］. Applied Mathematics and Computation. 2009， 215（2）： 767-778. 10.1016/j.amc.2009.05.066
12	LIN J C W， FOURNIER-VIGER P， GAN W. FHN： an efficient algorithm for mining high-utility itemsets with negative unit profits［J］. Knowledge-Based Systems， 2016， 111： 283-298. 10.1016/j.knosys.2016.08.022
13	SINGH K， SINGH S S， KUMAR A， et al. High utility itemsets mining with negative utility value： a survey［J］. Journal of Intelligent and Fuzzy Systems， 2018， 35（6）： 6551-6562. 10.3233/jifs-18965
14	FOURNIER-VIGER P， LIN J C W， TRUONG-CHI T， et al. A survey of high utility itemset mining［M］. New York： High-Utility Pattern Mining： Theory， Algorithms and Applications. Cham：Springer， 2019： 1-45.
15	JAYSAWAL B P， HUANG J W. SOHUPDS： a single-pass one-phase algorithm for mining high utility patterns over a data stream［C］// SAC 2020： Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing. New York： ACM， 2020： 490-497. 10.1145/3341105.3373928
16	LIU Y， LIAO W K. CHOUDHARY A. A two-phase algorithm for fast discovery of high utility itemsets［C］// Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham： Springer， 2005： 689-695. 10.1007/11430919_79
17	REDDY B A， RAO O S. An improved up-growth high utility itemset mining［J］. International Journal of Computer Applications， 2012， 58（2）： 25-28. 10.5120/9255-3424
18	LAN G C， HONG T P， TSENG V S. An efficient projection-based indexing approach for mining high utility itemsets［J］. Knowledge and Information Systems， 2014， 38：85-107. 10.1007/s10115-012-0492-y
19	LIU M， QU J. Mining high utility itemsets without candidate generation［C］// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York： ACM， 2012： 55-64. 10.1145/2396761.2396773
20	FOURNIER-VIGER P， WU C W， ZIDA S， et al. FHM： faster high-utility itemset mining using estimated utility co-occurrence pruning［C］// Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems. Cham： Springer， 2014： 83-92. 10.1007/978-3-319-08326-1_9
21	ZIDA S， FOURNIER-VIGER P， LIN J C W， et al. EFIM： a highly efficient algorithm for high-utility itemset mining ［C］// Proceedings of the 14th Mexican International Conference on Advances in Artificial Intelligence and Soft Computing. Cham： Springer， 2015： 530-546. 10.1007/978-3-319-27060-9_44
22	DUONG Q H， FOURNIER-VIGER P， RAMAMPIARO H， et al. Efficient high utility itemset mining using buffered utility-lists［J］. Applied Intelligence， 2018， 48（7）： 1859-1877. 10.1007/s10489-017-1057-2
23	LIN C W， LAN G C， HONG T P. An incremental mining algorithm for high utility itemsets［J］. Expert Systems with Applications， 2012， 39（8）： 7173-7180. 10.1016/j.eswa.2012.01.072
24	LIN C W， HONG T P， LAN G C， et al. Incrementally mining high utility patterns based on pre-large concept［J］. Applied Intelligence， 2014， 40（2）： 343-357. 10.1007/s10489-013-0467-z
25	FOURNIER-VIGER P， LIN J C W， GUENICHE T， et al. Efficient incremental high utility itemset mining［C］// Proceedings of the 2015 IEEE/ACM International Conference on Automated Software Engineering on Big Data. Cham： Springer， 2015： 1-6.
26	LIU J， WANG K， FUNG B C M. Direct discovery of high utility itemsets without candidate generation［C］// Proceedings of the 12th IEEE International Conference on Data Mining. Piscataway： IEEE， 2012： 984-989. 10.1109/icdm.2012.20
27	LIU J， JU X， ZHANG X， et al. Incremental mining of high utility patterns in one phase by absence and legacy-based pruning［J］. IEEE Access， 2019， 7： 74168-74180. 10.1109/access.2019.2919524
28	TRUONG T， DUONG H， LE B， et al. Efficient algorithms for mining frequent high utility sequences with constraints［J］. Information Sciences， 2021， 568（5）： 239-264. 10.1016/j.ins.2021.01.060
29	郭峰. 数据流上高效用模式挖掘算法的改进研究［D］. 武汉：武汉理工大学， 2019： 1-8.
	GUO F. Research on the improvement of efficient pattern mining algorithm on data stream ［D］. Wuhan： Wuhan University of Technology， 2019： 1-8.
30	CHU C J， TSENG V S， LIANG T. An efficient algorithm for mining temporal high utility itemsets from data streams［J］. Journal of Systems and Software， 2008， 81（7）： 1105-1117. 10.1016/j.jss.2007.07.026
31	AHMED C F， TANBEER S K， JEONG B S， et al. Interactive mining of high utility patterns over data streams［J］. Expert Systems with Applications， 2012， 39（15）： 11979-11991. 10.1016/j.eswa.2012.03.062
32	王乐. 数据流模式挖掘算法及应用研究［D］. 大连：大连理工大学， 2013： 15-20. 10.3940/rina.iccas.2013.30
	WANG L. Research on data stream pattern mining algorithm and application［D］. Dalian： Dalian University of Technology， 2013： 15-20. 10.3940/rina.iccas.2013.30
33	郭世明，高宏. 基于滑动窗口挖掘数据流高效用项集的有效算法［J］. 哈尔滨工程大学学报， 2018， 39（4）：721-729. 10.11990/jheu.201611075
	GUO S M， GAO H. An efficient algorithm for mining high utility itemsets from data Streams based on sliding window techniques［J］. Journal of Harbin Engineering University， 2018， 39（4）： 721-729. 10.11990/jheu.201611075
34	ZIHAYAT M， An A. Mining top-k， high utility patterns over data streams［J］. Information Sciences， 2014， 285：138-161. 10.1016/j.ins.2014.01.045
35	TSAI P S M. Mining high utility itemsets in data streams based on the weighted sliding window model［J］. International Journal of Data Mining & Knowledge Management Process， 2014， 4（2）：13-28. 10.5121/ijdkp.2014.4202
36	LAN G C， HONG T P， HUANG J P， et al. On-shelf utility mining with negative item values［J］. Expert Systems with Applications， 2014， 41（7）： 3450-3459. 10.1016/j.eswa.2013.10.049
37	SUBRAMANIAN K， KANDHASAMY P. UP-GNIV： an expeditious high utility pattern mining algorithm for itemsets with negative utility values［J］. International Journal of Information Technology and Management， 2015， 14（1）： 26-42. 10.1504/ijitm.2015.066056
38	XU T， DONG X， XU J， et al. Mining high utility sequential patterns with negative item values［J］. International Journal of Pattern Recognition and Artificial Intelligence， 2017. 31（10）： 1750035-1750037. 10.1142/s0218001417500355
39	SINGH K， SHAKYA H K， SINGH A， et al. Mining of high utility itemsets with negative utility［J］. Expert Systems， 2018， 35（6）： e12296.1-e12296.23. 10.1111/exsy.12296
40	KRISHNAMOORTHY S. Efficiently mining high utility itemsets with negative unit profits［J］. Knowledge-Based Systems， 2017， 145： 1-14. 10.1016/j.knosys.2017.12.035
41	FOURNIER-VIGER P， ZIDA S. FOSHU： faster on-shelf high utility itemset mining — with or without negative unit profit［C］// Proceedings of the 30th International Conference on Annual ACM Symposium on Applied Computing. New York： ACM， 2015：857-864. 10.1145/2695664.2695823
42	吕存伟，黄德才，陆亿红. 含负项的高效用序列模式挖掘算法［J］. 小型微型计算机系统， 2017， 38（8）： 1724-1729. 10.3969/j.issn.1000-1220.2017.08.013
	LV C W， HANG D C， LU Y H. High utility sequential pattern mining algorithm with negative terms［J］. Journal of Chinese Computer Systems， 2017， 38（8）： 1724-1729. 10.3969/j.issn.1000-1220.2017.08.013
43	陈丽娟. 含有负项值的高效用项集挖掘算法研究［D］. 福州：福州大学， 2018：5-15.
	CHEN L J. Research on mining algorithm of high-utility itemsets with negative item values ［D］. Fuzhou： Fuzhou University， 2018： 5-15.
44	SUN R， HAN M， ZHANG C Y， et al. Mining of top-k high utility itemsets with negative utility［J］. Journal of Intelligent and Fuzzy Systems， 2021， 40（3）：5637-5652. 10.3233/jifs-201357
45	SINGH K， KUMAR A， SINGH S S， et al. EHNL： an efficient algorithm for mining high utility itemsets with negative utility value and length constraints［J］. Information Sciences， 2019， 484： 44-70. 10.1016/j.ins.2019.01.056
46	TSENG V S， WU C W， FOURNIER-VIGER P， et al. Efficient algorithms for mining top-k high utility itemsets［J］. IEEE Transactions on Knowledge and Data Engineering， 2016， 28（1）： 54-67. 10.1109/tkde.2015.2458860
47	RYANG H， YUN U. Top-k high utility pattern mining with effective threshold raising strategies［J］. Knowledge-Based Systems， 2015， 76：109-126. 10.1016/j.knosys.2014.12.010
48	SINGH K， SINGH S S， KUMAR A， et al. TKEH： an efficient algorithm for mining top-k high utility itemsets［J］. Applied Intelligence， 2018， 49（3）： 1078-1097. 10.1007/s10489-018-1316-x
49	HAN X， LIU X， LI J， et al. Efficient top-k high utility itemset mining on massive data［J］. Information Sciences， 2021， 557： 382-406. 10.1016/j.ins.2020.08.028
50	LIN J C W， LI T， FOURNIER-VIGER P， et al. An efficient algorithm to mine high average-utility itemsets［J］. Advanced Engineering Informatics， 2016， 30（2）： 233-243. 10.1016/j.aei.2016.04.002
51	LIN C W， HONG T P， LU W H. Efficiently mining high average utility itemsets with a tree structure［C］// Proceedings of the 2010 Asian Conference on Intelligent Information and Database Systems. Cham： Springer， 2010： 131-139. 10.1007/978-3-642-12145-6_14
52	LAN G C， HONG T P， TSENG V S. A projection-based approach for discovering high average-utility itemsets［J］. Journal of Information Science and Engineering， 2012， 28（1）：193-209.
53	LAN G C， HONG T P， TSENG V S. Efficiently mining high average-utility itemsets with an improved upper-bound strategy［J］.International Journal of Information Technology & Decision Making， 2012， 11（5）： 1009-1030. 10.1142/s0219622012500307
54	LU T， VO B， NGUYEN H T， et al. A new method for mining high average utility itemsets［C］// Proceedings of the 2015 IFIP International Conference on Computer Information Systems and Industrial Management. Cham： Springer，2014： 33-42. 10.1007/978-3-662-45237-0_5
55	KIM H， YUN U， BAEK Y， et al. Efficient list based mining of high average utility patterns with maximum average pruning strategies［J］. Information Sciences， 2021， 543（3）： 85-105. 10.1016/j.ins.2020.07.043
56	SRIKANT R， AGRAWAL R. Mining sequential patterns： generalizations and performance improvements［C］// Proceedings of the 5th International Conference on Extending Database Technology. Cham： Springer， 1996： 3-17. 10.1007/bfb0014140
57	PEI J， HAN J， MORTAZAVI-ASL B， et al. Mining sequential patterns by pattern-growth： the prefixspan approach［J］. IEEE Transactions on Knowledge and Data Engineering， 2004， 16（11）： 1424-1440. 10.1109/tkde.2004.77
58	AYRES J， FLANNICK J， GEHRKE J， et al. Sequential pattern mining using a bitmap representation［C］// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2002： 429-435. 10.1145/775047.775109
59	FOURNIER-VIGER P， ZHANG Y， LIN J C W， et al. Mining local and peak high utility itemsets［J］. Information Sciences， 2019， 481： 344-367. 10.1016/j.ins.2018.12.070

T_id	Transaction
T₁	（A，1）（B，2）（C，3）（D，4）
T₂	（A，2）（B，3）（C，2）
T₃	（B，2）（C，2）（D，3）（F，4）
T₄	（C，2）（D，3）（E，2）（F，1）
T₅	（A，1）（C，1）（E，2）（F，3）

T_id	Transaction
T₁	（A，1）（B，2）（C，3）（D，4）
T₂	（A，2）（B，3）（C，2）
T₃	（B，2）（C，2）（D，3）（F，4）
T₄	（C，2）（D，3）（E，2）（F，1）
T₅	（A，1）（C，1）（E，2）（F，3）

算法	数据库	阶段数	关键技术	数据结构	特征	优点	缺点
Two-Phase	静	2	基于先验		两阶段算法	可以削减大量候选项	修剪搜索空间没有使用适当的效用上限，需要多次数据库扫描
UP-Growth	静	2	基于树	树	两阶段算法	在事务压缩方面表现较好	树结构存在不容易扩展的问题
HUP-Growth	静	2	基于树和两阶段模型	树	两阶段算法	紧凑的树状结构	数据结构复杂
PB	静	2	基于先验；索引机制		两阶段算法	采用索引机制来加快挖掘过程；挖掘过程中的内存需求低	具有两阶段算法的缺点：生成大量候选并多次扫描数据库
HUI-Miner	静	1	基于列表	效用列表	一阶段算法	解决了两阶段算法需要重复扫描数据集的问题	效用列表的链接操作花费较大，可扩展性较差
d²HUP	静	1	基于列表链	效用列表	一阶段算法	使用准确的效用列表链和更为严格的上限	数据结构消耗内存较多
FHM	静	1	基于列表链	效用列表	一阶段算法	其有HUI-Miner的优点，且减少了效用列表之间的联接操作	需要特别的存储空间以维护EUCS中有希望的项目对的TWU
EFIM	静	1	基于投影	数组	一阶段算法	减少了数据库的存储成本，降低内存消耗	需要对原始数据库进行多次扫描
ULB-Miner	静	1	缓冲区	效用列表	一阶段算法	减少内存消耗并加快联接操作	仅使用数据结构来减少运行时间和空间
IHUP^［5］	动	2	基于树	树	基于增量数据库	树结构比以前的树更紧凑，可扩展	在阶段一中产生太多HTWUI
FUP-HUI-INS	动	2	基于先验	哈希集合	基于增量数据库	使用FUP的概念，不会每次都重新扫描之前的数据库，避免了生成部分不必要的候选项集	依赖于FUP概念，但遭受搜索空间组合爆炸的困扰
PRE-HUI-INS	动	2	基于先验	树	基于增量数据库	基于预大型概念的属性可避免数据库重新扫描	由两阶段模型处理，有着两阶段算法的所有限制
HUPID-Growth	动	2	基于树/列表	树	基于增量数据库	通过一次数据库扫描即可构建树，有效减少候选模式的数量	由于使用了新的效用高估策略，需要另外构建一些数据信息
HUI-LIST-INS	动	1	基于列表	效用列表	基于增量数据库	基于效用列表结构的计算而无需生成候选对象，估计效用共现结构的应用加快了增量挖掘过程	效用列表链接操作成本较高
EIHI	动	1	基于列表	HUI-trie列表	基于增量数据库	将所有项集存储在一种称为trie的结构中，能够快速更新其效用，缩短挖掘时间	当minutil阈值设置得较低且数据库频繁更新时，效率可能是一个问题
LIHUP	动	1	基于列表	列表	基于增量数据库	仅进行一次数据库扫描即可重新构建列表，高效地执行挖掘过程	当minutil阈值设置得较低并且数据库频繁更新时，列表连接过程开销较大，效率可能是问题
IIHUM	动	1	基于列表	索引列表	基于增量数据库	从带索引的全局列表结构递归生成条件索引效用列表数据结构，依据TWU信息来重组全局列表，挖掘高效用模式更加有效	由于TWU信息不被用来高估效用，当minutil阈值设置得较低并且数据库频繁更新时，列表索引更新成本增大
HUPID-Growth	动	2	基于树/列表	树	基于增量数据库	通过一次数据库扫描即可构建树，有效减少候选模式的数量	由于使用了新的效用高估策略，需要另外构建一些数据信息
THUI-Mine	动	2	基于树	树	基于数据流	运行时间快，候选项集数量明显减少	产生了许多无用候选项集，并在执行过程中需要占用大量内存
HUPMS	动	2	基于树	树	基于数据流	可以按照窗口内容来更新树结构	相较于频繁模式的滑动窗口，是比较慢的
HUM-UT	动	1	基于树	树	基于数据流	不产生候选项集筛选阶段	创建的全局头表包括了冗余数据项；对全局效用树中的低效用数据项做了无作用的处理，这样会降低算法的运行效率
HUISW	动	1	基于树、列表	树	基于数据流	在稠密数据流中最快可提升两个数量级	数据项及数据项效用信息需要在全局效用库及条件效用数据库中重复存储，且该算法仍然对低效用数据项做了无用的处理
T-HUDS	动	2	基于树	树	基于数据流	采用前缀效用作为效用估计模型，可以有效地修剪搜索空间	算法不易扩展
HUI_W	动	2	基于列表	列表	基于数据流	排除了低重要性模式，减少候选模式数量	消耗时间较多
SOHUPDS	动	1	基于投影	列表	基于数据流	IUDataListSW能有效地获取项目的初始投影数据库	算法不易扩展
MEFIM	动	1	投影	数组	基于动态利润数据库	极大地减少了重新扫描原始数据库的次数	内存占用过多

算法	数据库	阶段数	关键技术	数据结构	特征	优点	缺点
Two-Phase	静	2	基于先验		两阶段算法	可以削减大量候选项	修剪搜索空间没有使用适当的效用上限，需要多次数据库扫描
UP-Growth	静	2	基于树	树	两阶段算法	在事务压缩方面表现较好	树结构存在不容易扩展的问题
HUP-Growth	静	2	基于树和两阶段模型	树	两阶段算法	紧凑的树状结构	数据结构复杂
PB	静	2	基于先验；索引机制		两阶段算法	采用索引机制来加快挖掘过程；挖掘过程中的内存需求低	具有两阶段算法的缺点：生成大量候选并多次扫描数据库
HUI-Miner	静	1	基于列表	效用列表	一阶段算法	解决了两阶段算法需要重复扫描数据集的问题	效用列表的链接操作花费较大，可扩展性较差
d²HUP	静	1	基于列表链	效用列表	一阶段算法	使用准确的效用列表链和更为严格的上限	数据结构消耗内存较多
FHM	静	1	基于列表链	效用列表	一阶段算法	其有HUI-Miner的优点，且减少了效用列表之间的联接操作	需要特别的存储空间以维护EUCS中有希望的项目对的TWU
EFIM	静	1	基于投影	数组	一阶段算法	减少了数据库的存储成本，降低内存消耗	需要对原始数据库进行多次扫描
ULB-Miner	静	1	缓冲区	效用列表	一阶段算法	减少内存消耗并加快联接操作	仅使用数据结构来减少运行时间和空间
IHUP^［5］	动	2	基于树	树	基于增量数据库	树结构比以前的树更紧凑，可扩展	在阶段一中产生太多HTWUI
FUP-HUI-INS	动	2	基于先验	哈希集合	基于增量数据库	使用FUP的概念，不会每次都重新扫描之前的数据库，避免了生成部分不必要的候选项集	依赖于FUP概念，但遭受搜索空间组合爆炸的困扰
PRE-HUI-INS	动	2	基于先验	树	基于增量数据库	基于预大型概念的属性可避免数据库重新扫描	由两阶段模型处理，有着两阶段算法的所有限制
HUPID-Growth	动	2	基于树/列表	树	基于增量数据库	通过一次数据库扫描即可构建树，有效减少候选模式的数量	由于使用了新的效用高估策略，需要另外构建一些数据信息
HUI-LIST-INS	动	1	基于列表	效用列表	基于增量数据库	基于效用列表结构的计算而无需生成候选对象，估计效用共现结构的应用加快了增量挖掘过程	效用列表链接操作成本较高
EIHI	动	1	基于列表	HUI-trie列表	基于增量数据库	将所有项集存储在一种称为trie的结构中，能够快速更新其效用，缩短挖掘时间	当minutil阈值设置得较低且数据库频繁更新时，效率可能是一个问题
LIHUP	动	1	基于列表	列表	基于增量数据库	仅进行一次数据库扫描即可重新构建列表，高效地执行挖掘过程	当minutil阈值设置得较低并且数据库频繁更新时，列表连接过程开销较大，效率可能是问题
IIHUM	动	1	基于列表	索引列表	基于增量数据库	从带索引的全局列表结构递归生成条件索引效用列表数据结构，依据TWU信息来重组全局列表，挖掘高效用模式更加有效	由于TWU信息不被用来高估效用，当minutil阈值设置得较低并且数据库频繁更新时，列表索引更新成本增大
HUPID-Growth	动	2	基于树/列表	树	基于增量数据库	通过一次数据库扫描即可构建树，有效减少候选模式的数量	由于使用了新的效用高估策略，需要另外构建一些数据信息
THUI-Mine	动	2	基于树	树	基于数据流	运行时间快，候选项集数量明显减少	产生了许多无用候选项集，并在执行过程中需要占用大量内存
HUPMS	动	2	基于树	树	基于数据流	可以按照窗口内容来更新树结构	相较于频繁模式的滑动窗口，是比较慢的
HUM-UT	动	1	基于树	树	基于数据流	不产生候选项集筛选阶段	创建的全局头表包括了冗余数据项；对全局效用树中的低效用数据项做了无作用的处理，这样会降低算法的运行效率
HUISW	动	1	基于树、列表	树	基于数据流	在稠密数据流中最快可提升两个数量级	数据项及数据项效用信息需要在全局效用库及条件效用数据库中重复存储，且该算法仍然对低效用数据项做了无用的处理
T-HUDS	动	2	基于树	树	基于数据流	采用前缀效用作为效用估计模型，可以有效地修剪搜索空间	算法不易扩展
HUI_W	动	2	基于列表	列表	基于数据流	排除了低重要性模式，减少候选模式数量	消耗时间较多
SOHUPDS	动	1	基于投影	列表	基于数据流	IUDataListSW能有效地获取项目的初始投影数据库	算法不易扩展
MEFIM	动	1	投影	数组	基于动态利润数据库	极大地减少了重新扫描原始数据库的次数	内存占用过多

[1]	Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484.
[2]	Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM： frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064.
[3]	Yufei MENG, Youxi WU, Zhen WANG, Yan LI. Contrast order-preserving pattern mining algorithm [J]. Journal of Computer Applications, 2023, 43(12): 3740-3746.
[4]	Lei MA, Chuan LUO, Tianrui LI, Hongmei CHEN. Fuzzy-rough set based unsupervised dynamic feature selection algorithm [J]. Journal of Computer Applications, 2023, 43(10): 3121-3128.
[5]	Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree [J]. Journal of Computer Applications, 2022, 42(9): 2713-2721.
[6]	Qun MAO, Weiwei WANG, Feng YOU, Ruilian ZHAO, Zheng LI. Pattern mining and reuse method for user behaviors of Android applications [J]. Journal of Computer Applications, 2022, 42(7): 2155-2161.
[7]	Zhihui SHAN, Meng HAN, Qiang HAN. Survey of high utility pattern mining on dynamic data [J]. Journal of Computer Applications, 2022, 42(1): 94-108.
[8]	KANG Jun, HUANG Shan, DUAN Zongtao, LI Yixiu. Review of spatio-temporal trajectory sequence pattern mining methods [J]. Journal of Computer Applications, 2021, 41(8): 2379-2385.
[9]	LI Xiuyan, LIU Mingxi, SHI Wenbo, DONG Guofang. Efficient dynamic data audit scheme for resource-constrained users [J]. Journal of Computer Applications, 2021, 41(2): 422-432.
[10]	WANG Chunying, ZHANG Xun, ZHAO Jinxiong, YUAN Hui, LI Fangjun, ZHAO Bo, ZHU Xiaoqin, YANG Fan, LYU Shichao. Analysis of attack events based on multi-source alerts [J]. Journal of Computer Applications, 2020, 40(1): 123-128.
[11]	HAN Meng, DING Jian. Survey of frequent pattern mining over data streams [J]. Journal of Computer Applications, 2019, 39(3): 719-727.
[12]	WANG Ju, LIU Fuxian, JIN Chunjie. General bound estimation method for pattern measures over uncertain datasets [J]. Journal of Computer Applications, 2018, 38(1): 165-170.
[13]	ZHANG Yaling, WANG Ting, WANG Shangping. Incremental frequent pattern mining algorithm for privacy-preserving [J]. Journal of Computer Applications, 2018, 38(1): 176-181.
[14]	LI Xiaolin, DU Tuo, LIU Biao. Fast algorithm for mining frequent patterns based on B-list [J]. Journal of Computer Applications, 2017, 37(8): 2357-2361.
[15]	ZHANG Haiqing, LI Daiwei, LIU Yintian, GONG Cheng, YU Xi. Mining algorithm of maximal fuzzy frequent patterns [J]. Journal of Computer Applications, 2017, 37(5): 1424-1429.

Survey of high utility pattern mining methods based on positive and negative utility division

基于正负效用划分的高效用模式挖掘方法综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 59

Related Articles 15

Recommended Articles

Metrics

算法	年份	关键技术	数据结构	优点	缺点
HUINIV-Mine	2009	基于先验		第一个实现在含负项的数据集中挖掘HUIs的算法	过程耗时，需要大量的存储空间
TS-HOUN	2014	基于先验		避免生成大量的冗余候选，并有效减少挖掘中的数据扫描次数	需要大量的运行时和内存空间来完成挖掘任务并生成大量的候选对象
UP-GNIV	2014	基于树	树	不生成候选项	构造树的成本较高
HUSP-NIV	2017	基于树	树	是第一个挖掘具有负效用的高效用序列模式的算法	构造树的成本较高，且比较耗时
EHIN	2018	基于树	树	通过合并相同的事务来降低数据集扫描成本，使用基于投影数据集的事务合并技术进一步降低数据集扫描成本	构造树的成本较高，且比较耗时
FHN	2016	基于效用列表	列表	使用有效的修剪策略减小搜索空间；更适用于处理密集数据集	使用复杂的数据结构；会生成数据库中没有的候选项
GHUM	2017	基于效用列表	列表	使用简单的效用列表；在稀疏和密集数据集上都具有良好的性能。	创建和维护效用列表非常耗时，并且会占用大量内存
FOSHU	2015	基于效用列表	列表	避免了对在每个时间段中发现的项集进行合并操作	内存占用大
KOSHU	2017	基于效用列表	列表	运行时间少，内存消耗小	效用列表的联接操作花费较大
EHIN	2018	基于数组；基于树	树	EHIN对于密集数据集始终表现更好；减小了搜索空间	构造树的成本较高，且比较耗时
EHNL	2019	基于数组；基于树	树	引入了最小长度约束，以删除大量细小的项目集	数组要预留空间，内存效用低
THN	2021	基于效用列表	列表	解决了在挖掘同时含有正项和负项项集时不需要设置最小效用阈值的问题，使用自动提升最小效用阈值的策略减少算法运行时间	在稀疏数据集上的运行时间较长

Item	profit	Item	profit
A	2	D	-3
B	-2	E	1
C	3	F	3

Item	profit	Item	profit
A	2	D	-3
B	-2	E	1
C	3	F	3