《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 999-1010.DOI: 10.11772/j.issn.1001-9081.2021071268

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇    下一篇

基于正负效用划分的高效用模式挖掘方法综述

张妮, 韩萌(), 王乐, 李小娟, 程浩东   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 收稿日期:2021-07-16 修回日期:2021-08-13 接受日期:2021-08-19 发布日期:2021-08-13 出版日期:2022-04-10
  • 通讯作者: 韩萌
  • 作者简介:张妮(1996—),女,山西长治人,硕士研究生,CCF会员,主要研究方向:高效用模式挖掘
    王乐(1994—),女,吉林白城人,硕士研究生,CCF会员,主要研究方向:数据流集成分类
    李小娟(1994—),女,宁夏吴忠人,硕士研究生,CCF会员,主要研究方向:数据流集成分类
    程浩东(1996—),男,山东泰安人,硕士研究生,CCF会员,主要研究方向:高效用模式挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216);北方民族大学研究生创新项目(YCX21082)

Survey of high utility pattern mining methods based on positive and negative utility division

Ni ZHANG, Meng HAN(), Le WANG, Xiaojuan LI, Haodong CHENG   

  1. School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China
  • Received:2021-07-16 Revised:2021-08-13 Accepted:2021-08-19 Online:2021-08-13 Published:2022-04-10
  • Contact: Meng HAN
  • About author:ZHANG Ni, born in 1996, M. S. candidate. Her research interests include high utility pattern mining.
    WANG Le, born in 1994, M. S. candidate. Her research interests include data stream integration classification.
    LI Xiaojuan, born in 1994, M. S. candidate. Her research interests include data stream integration classification.
    CHENG Haodong, born in 1996, M. S. candidate. His research interests include high utility pattern mining.
  • Supported by:
    National Natural Science Foundation of China(62062004);Ningxia Natural Science Foundation(2020AAC03216);Postgraduate Innovation Project of North Minzu University(YCX21082)

摘要:

高效用模式挖掘(HUPM)是新兴的数据科学研究内容之一,通过考虑事务数据库中项的单位利润和数量,以提取出更有用的信息。传统的HUPM方法假定所有项的效用值均为正,但是在实际应用中,某些数据项的效用值可能为负(如商品因产生亏损而导致利润值为负),含负项的模式挖掘与仅含正项的模式挖掘同样重要。首先,阐述了HUPM的相关概念,并分别给出相应正负效用的实例;然后,以正与负角度划分了HUPM方法,其中带有正效用的模式挖掘方法进一步以动态与静态的数据库新颖角度划分,带有负效用的模式挖掘方法中包括了基于先验、基于树、基于效用列表和基于数组等关键技术,并从不同方面对这些方法进行了讨论和总结;最后,给出了现有HUPM方法的不足和下一步研究方向。

关键词: 模式挖掘, 高效用模式, 正效用, 负效用, 静态数据, 动态数据

Abstract:

High Utility Pattern Mining (HUPM) is one of the emerging data science research contents. The unit profit and number of items in the transaction database are considered to extract more useful information. The utility value of each item is assumed to be positive by the traditional HUPM methods, but in practical applications, the utility values of some data items may be negative (for example, the profit value of the product is negative due to a loss), and the pattern mining with negative items is as important as the pattern mining with only positive terms. Firstly, the relevant concepts of HUPM were explained, and the examples of corresponding positive and negative utilities were given. Then, the HUPM methods were divided into positive and negative perspectives, among which the pattern mining methods with positive utility were further divided into dynamic and static database perspectives; the pattern mining methods with negative utility included priori-based, tree-based, utility list-based, and array-based key technologies. the HUPM methods were discussed and summarized from different aspects. Finally, the shortcomings of the existing HUPM methods and the next research directions were given.

Key words: pattern mining, high utility pattern, positive utility, negative utility, static data, dynamic data

中图分类号: