Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1473-1480.DOI: 10.11772/j.issn.1001-9081.2022040622

• Data science and technology • Previous Articles    

Bat algorithm for high utility itemset mining based on length constraint

Quan YUAN1,2, Chengliang TANG1,2(), Yunpeng XU1,2   

  1. 1.School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
    2.Research Center of New Communication Technology Applications,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2022-05-07 Revised:2022-07-28 Accepted:2022-08-02 Online:2022-09-27 Published:2023-05-10
  • Contact: Chengliang TANG
  • About author:YUAN Quan, born in 1976, M. S., senior engineer. His research interests include big data, data mining, natural language processing.
    TANG Chengliang, born in 1997, M. S. candidate. His research interests include data mining.
    XU Yunpeng, born in 1998, M. S. candidate. His research interests include data mining.

基于长度约束的蝙蝠高效用项集挖掘算法

袁泉1,2, 唐成亮1,2(), 徐雲鹏1,2   

  1. 1.重庆邮电大学 通信与信息工程学院,重庆 400065
    2.重庆邮电大学 通信新技术应用研究中心,重庆 400065
  • 通讯作者: 唐成亮
  • 作者简介:袁泉(1976—),男,湖南邵阳人,高级工程师,硕士,主要研究方向:大数据、数据挖掘、自然语言处理
    唐成亮(1997—),男,四川德阳人,硕士研究生,主要研究方向:数据挖掘 cy.tcl@qq.com
    徐雲鹏(1998—),男,河南漯河人,硕士研究生,主要研究方向:数据挖掘。

Abstract:

In order to mine the High Utility Itemsets (HUIs) that meet the special needs of users, such as the specified number of items, a Bat Algorithm for High Utility Itemset Mining based on Length Constraint (HUIM-LC-BA) was proposed. By combining the Bat Algorithm (BA) and length constraints, a new High Utility Itemset Mining (HUIM) model was constructed, in which the database was transformed into a bitmap matrix to realize efficient utility calculation and database scanning. Then the search space was reduced by using the Redefined Transaction Weighted Utility (RTWU) strategy. Finally, the lengths of the itemsets were pruned according to the items determined by roulette bet selection method and depth first search. Experiments on four datasets showed that, when the maximum length was 6, the number of patterns mined by HUIM-LC-BA was reduced by 91%, 98%, 99% and 97% respectively compared with that of HUIM-BA (High Utility Itemset Mining-Bat Algorithm) with less running time; and under different length constraints, the running time of HUIM-LC-BA is more stable compared to the FHM+ (Faster High-utility itemset Ming plus) algorithm. Experimental results indicate that HUIM-LC-BA can effectively mine HUIs with length constraints and reduce the number of mined patterns.

Key words: High Utility Itemset Mining (HUIM), bat algorithm, length constraint, bitmap matrix, roulette bet selection method

摘要:

为了挖掘满足用户特殊需求,如含指定项目数量的高效用项集(HUI),提出一种基于长度约束的蝙蝠高效用项集挖掘算法(HUIM-LC-BA)。该算法融合蝙蝠算法(BA)和长度约束构建高效用项集挖掘(HUIM)模型,首先将数据库转换为位图矩阵,实现高效的效用计算和数据库扫描;其次,采用重新定义的事务加权效用(RTWU)策略缩减搜索空间;最后,对项集进行长度修剪,使用深度优先搜索和轮盘赌注选择法确定修剪项目。在4个数据集的仿真实验中,当最大长度为6时,与HUIM-BA相比,HUIM-LC-BA挖掘的模式数量分别减少了91%、98%、99%与97%,同时运行时间也少于HUIM-BA;且在不同长度约束条件下,与FHM+ (Faster High-utility itemset Ming plus)算法相比运行时间更稳定。实验结果表明,HUIM-LC-BA能有效挖掘具有长度约束的HUI,并减少挖掘模式的数量。

关键词: 高效用项集挖掘, 蝙蝠算法, 长度约束, 位图矩阵, 轮盘赌注选择法

CLC Number: