Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2200-2208.DOI: 10.11772/j.issn.1001-9081.2022060907
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Hua JIANG, Xing LI(), Huijiao WANG, Jinghai WEI
Received:
2022-06-22
Revised:
2022-09-03
Accepted:
2022-09-09
Online:
2022-10-08
Published:
2023-07-10
Contact:
Xing LI
About author:
JIANG Hua, born in 1963, Ph. D., professor. His research interests include database system, information security.Supported by:
通讯作者:
李星
作者简介:
蒋华(1963—),男,河南信阳人,教授,博士,主要研究方向:数据库系统、信息安全;基金资助:
CLC Number:
Hua JIANG, Xing LI, Huijiao WANG, Jinghai WEI. Cross-level high utility itemsets mining algorithm based on data index structure[J]. Journal of Computer Applications, 2023, 43(7): 2200-2208.
蒋华, 李星, 王慧娇, 韦静海. 基于数据索引结构的跨级高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2200-2208.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022060907
事务编号 | 事务 | 事务编号 | 事务 |
---|---|---|---|
T0 | (1,1) (3,1) | T4 | (1,1) (3,1) (4,1) |
T1 | (5,1) | T5 | (1,2) (3,6) (5,2) |
T2 | (1,1) (2,5) (3,1) (4,3) (5,1) | T6 | (2,2) (3,2) (5,1) |
T3 | (2,4) (3,3) (4,3) (5,1) |
Tab. 1 Sample transaction dataset
事务编号 | 事务 | 事务编号 | 事务 |
---|---|---|---|
T0 | (1,1) (3,1) | T4 | (1,1) (3,1) (4,1) |
T1 | (5,1) | T5 | (1,2) (3,6) (5,2) |
T2 | (1,1) (2,5) (3,1) (4,3) (5,1) | T6 | (2,2) (3,2) (5,1) |
T3 | (2,4) (3,3) (4,3) (5,1) |
项 | 外部效用 | 项 | 外部效用 |
---|---|---|---|
1 | 5 | 4 | 2 |
2 | 2 | 5 | 3 |
3 | 1 |
Tab. 2 External utility values
项 | 外部效用 | 项 | 外部效用 |
---|---|---|---|
1 | 5 | 4 | 2 |
2 | 2 | 5 | 3 |
3 | 1 |
项 | 父项 | 项 | 父项 |
---|---|---|---|
1 | 6 | 4 | 8 |
2 | 6 | 5 | 8 |
3 | 7 | 6 | 7 |
Tab. 3 Taxonomy of items in sample transaction dataset
项 | 父项 | 项 | 父项 |
---|---|---|---|
1 | 6 | 4 | 8 |
2 | 6 | 5 | 8 |
3 | 7 | 6 | 7 |
项 | GWU | 项 | GWU |
---|---|---|---|
1 | 61 | 5 | 79 |
2 | 54 | 6 | 90 |
3 | 90 | 7 | 90 |
4 | 53 | 8 | 87 |
Tab. 4 GWU values of items in sample transaction dataset
项 | GWU | 项 | GWU |
---|---|---|---|
1 | 61 | 5 | 79 |
2 | 54 | 6 | 90 |
3 | 90 | 7 | 90 |
4 | 53 | 8 | 87 |
数据集 | 事务数 | 一般项数 | 抽象项数 | 平均长度 | 最大长度 | 层数 |
---|---|---|---|---|---|---|
Fruithut | 181 970 | 1 265 | 43 | 3.58 | 36 | 4 |
Liquor | 9 284 | 2 626 | 77 | 2.70 | 5 | 7 |
Connect | 67 557 | 129 | 40 | 43.00 | 43 | 4 |
Chainstore | 1 112 949 | 40 086 | 11 936 | 7.20 | 170 | 10 |
Tab. 5 Basic characteristics of datasets
数据集 | 事务数 | 一般项数 | 抽象项数 | 平均长度 | 最大长度 | 层数 |
---|---|---|---|---|---|---|
Fruithut | 181 970 | 1 265 | 43 | 3.58 | 36 | 4 |
Liquor | 9 284 | 2 626 | 77 | 2.70 | 5 | 7 |
Connect | 67 557 | 129 | 40 | 43.00 | 43 | 4 |
Chainstore | 1 112 949 | 40 086 | 11 936 | 7.20 | 170 | 10 |
minutil | 运行时间/s | 内存消耗/MB | ||||
---|---|---|---|---|---|---|
DISCH | ML-HUI Miner | HUI-Miner | DISCH | ML-HUI Miner | HUI-Miner | |
9 000 | 267.58 | 39.63 | 0.08 | 191.86 | 129.92 | 59.15 |
10 000 | 221.65 | 40.34 | 0.07 | 146.43 | 68.22 | 102.86 |
11 000 | 110.69 | 38.17 | 0.06 | 138.24 | 108.13 | 28.41 |
12 000 | 97.71 | 39.76 | 0.05 | 226.25 | 183.20 | 74.19 |
13 000 | 86.93 | 40.59 | 0.05 | 150.76 | 315.74 | 117.65 |
Tab. 6 Performance comparison of DISCH, ML-HUI Miner and HUI-Miner algorithms on Liquor dataset
minutil | 运行时间/s | 内存消耗/MB | ||||
---|---|---|---|---|---|---|
DISCH | ML-HUI Miner | HUI-Miner | DISCH | ML-HUI Miner | HUI-Miner | |
9 000 | 267.58 | 39.63 | 0.08 | 191.86 | 129.92 | 59.15 |
10 000 | 221.65 | 40.34 | 0.07 | 146.43 | 68.22 | 102.86 |
11 000 | 110.69 | 38.17 | 0.06 | 138.24 | 108.13 | 28.41 |
12 000 | 97.71 | 39.76 | 0.05 | 226.25 | 183.20 | 74.19 |
13 000 | 86.93 | 40.59 | 0.05 | 150.76 | 315.74 | 117.65 |
数据集大小占比/% | 运行时间/s | 项集数 |
---|---|---|
20 | 54.9 | 24 |
40 | 77.3 | 53 |
60 | 116.5 | 95 |
80 | 178.7 | 150 |
100 | 258.2 | 224 |
Tab. 7 Scalability of DISCH algorithm when changing dataset size
数据集大小占比/% | 运行时间/s | 项集数 |
---|---|---|
20 | 54.9 | 24 |
40 | 77.3 | 53 |
60 | 116.5 | 95 |
80 | 178.7 | 150 |
100 | 258.2 | 224 |
1 | HAJIHOSEINI M, SOHRABI M K. Mining fuzzy high average-utility itemsets using fuzzy utility lists and efficient pruning approach[J]. Soft Computing, 2022, 26(13): 6063-6086. 10.1007/s00500-022-07123-7 |
2 | ASHRAF M, ABDELKADER T, RADY S, et al. TKN: an efficient approach for discovering top-k high utility itemsets with positive or negative profits[J]. Information Sciences, 2022, 587: 654-678. 10.1016/j.ins.2021.12.024 |
3 | 蒋华,路昕宇,王慧娇,等.基于DBP的Top-k高效用项集挖掘算法[J].计算机工程与设计, 2021, 42(6): 1631-1637. |
JIANG H, LU X Y, WANG H J, et al. Top-k high utility itemsets mining based on DBP[J]. Computer Engineering and Design, 2021, 42(6): 1631-1637. | |
4 | 程浩东,韩萌,张妮,等.基于滑动窗口模型的数据流闭合高效用项集挖掘[J].计算机研究与发展, 2021, 58(11): 2500-2514. 10.7544/issn1000-1239.2021.20200554 |
CHENG H D, HAN M, ZHANG N, et al. Closed high utility itemsets mining over data stream based on sliding window model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. 10.7544/issn1000-1239.2021.20200554 | |
5 | TUNG N T, NGUYEN L T T, NGUYEN T D D, et al. An efficient method for mining multi-level high utility itemsets[J]. Applied Intelligence, 2022, 52(5): 5475-5496. 10.1007/s10489-021-02681-z |
6 | CAGLIERO L, CHIUSANO S, GARZA P, et al. Discovering high-utility itemsets at multiple abstraction levels [C]// Proceedings of the 2017 European Conference on Advances in Databases and Information Systems, CCIS 767. Cham: Springer, 2017: 224-234. 10.1007/978-3-319-67162-8_22 |
7 | FOURNIER-VIGER P, WANG Y, LIN J C W, et al. Mining cross-level high utility itemsets [C]// Proceedings of the 2020 International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, LNCS 12144. Cham: Springer, 2020: 858-871. |
8 | RAHMATI B, SOHRABI M K. A systematic survey on high utility itemset mining[J]. International Journal of Information Technology and Decision Making, 2019, 18(4): 1113-1185. 10.1142/s0219622019300027 |
9 | LIU M C, QU J F. Mining high utility itemsets without candidate generation [C]// Proceedings of the Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM, 2012: 55-64. 10.1145/2396761.2396773 |
10 | FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning [C]// Proceedings of the 2014 International Symposium on Methodologies for Intelligent Systems, LNCS 8502. Cham: Springer, 2014: 83-92. |
11 | KRISHNAMOORTHY S. Pruning strategies for mining high utility itemsets[J]. Expert Systems with Applications, 2015, 42(5): 2371-2381. 10.1016/j.eswa.2014.11.001 |
12 | ZIDA S, FOURNIER-VIGER P, LIN J C W, et al. EFIM: a fast and memory efficient algorithm for high-utility itemset mining[J]. Knowledge and Information Systems, 2017, 51(2): 595-625. 10.1007/s10115-016-0986-0 |
13 | PENG A Y, KOH Y S, RIDDLE P. mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets [C]// Proceedings of the 2017 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 10235. Cham: Springer, 2017: 196-207. |
14 | DUONG Q H, FOURNIER-VIGER P, RAMAMPIARO H, et al. Efficient high utility itemset mining using buffered utility-lists[J]. Applied Intelligence, 2018, 48(7): 1859-1877. 10.1007/s10489-017-1057-2 |
15 | QU J F, FOURNIER-VIGER P, LIU M C, et al. Mining high utility itemsets using extended chain structure and utility machine[J]. Knowledge-Based Systems, 2020, 208: No.106457. 10.1016/j.knosys.2020.106457 |
16 | WU P, NIU X Z, FOURNIER-VIGER P, et al. UBP-Miner: an efficient bit based high utility itemset mining algorithm[J]. Knowledge-Based Systems, 2022, 248: No.108865. 10.1016/j.knosys.2022.108865 |
17 | 何登平,何宗浩.基于R-list的Top-K高效用项集挖掘算法[J].计算机工程与科学, 2019, 41(7): 1318-1324. 10.3969/j.issn.1007-130X.2019.07.025 |
HE D P, HE Z H. A top-k high utility itemset mining algorithm based on R-list[J]. Computer Engineering and Science, 2019, 41(7): 1318-1324. 10.3969/j.issn.1007-130X.2019.07.025 | |
18 | 张妮,韩萌,王乐,等.基于索引列表的增量高效用模式挖掘算法[J].山东大学学报(工学版), 2022, 52(2): 107-117. |
ZHANG N, HAN M, WANG L, et al. Incremental high utility pattern mining algorithm based on index list[J]. Journal of Shandong University (Engineering Science), 2022, 52(2): 107-117. | |
19 | 孙蕊,韩萌,张春砚,等.含负项top-k高效用项集挖掘算法[J].计算机应用, 2021, 41(8): 2386-2395. |
SUN R, HAN M, ZHANG C Y, et al. Algorithm for mining top-k high utility itemsets with negative items[J]. Journal of Computer Applications, 2021, 41(8): 2386-2395. | |
20 | NOUIOUA M, WANG Y, FOURNIER-VIGER P, et al. TKC: mining top-k cross-level high utility itemsets [C]// Proceedings of the 2020 International Conference on Data Mining Workshops. Piscataway: IEEE, 2020: 673-682. 10.1109/icdmw51313.2020.00095 |
21 | TUNG N T, NGUYEN L T T, NGUYEN T D D, et al. Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases[J]. Information Sciences, 2022, 587: 41-62. 10.1016/j.ins.2021.12.017 |
22 | FOURNIER-VIGER P, GOMARIZ A, GUENICHE T, et al. SPMF: a Java open-source pattern mining library[J]. Journal of Machine Learning Research, 2014, 15: 3569-3573. |
[1] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[2] | Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682. |
[3] | Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484. |
[4] | Haodong ZHENG, Hua MA, Yingchao XIE, Wensheng TANG. Knowledge tracing model based on graph neural network blending with forgetting factors and memory gate [J]. Journal of Computer Applications, 2023, 43(9): 2747-2752. |
[5] | Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM: frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064. |
[6] | Zhihui GAO, Meng HAN, Shujuan LIU, Ang LI, Dongliang MU. Survey of high utility itemset mining methods based on intelligent optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1676-1686. |
[7] | Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841. |
[8] | Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472. |
[9] | Xiaomeng SHAO, Meng ZHANG. Temporal convolutional knowledge tracing model with attention mechanism [J]. Journal of Computer Applications, 2023, 43(2): 343-348. |
[10] | Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763. |
[11] | Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree [J]. Journal of Computer Applications, 2022, 42(9): 2713-2721. |
[12] | Shunkun YU, Hongxu YAN. Heuristic attribute value reduction model based on certainty factor [J]. Journal of Computer Applications, 2022, 42(2): 469-474. |
[13] | LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572. |
[14] | LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499. |
[15] | CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||