Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (12): 3867-3875.DOI: 10.11772/j.issn.1001-9081.2023111707
• Advanced computing • Previous Articles Next Articles
Bo LI1,2, Jianqiang HUANG1,2(), Dongqiang HUANG1,2, Xiaoying WANG1
Received:
2023-12-11
Revised:
2024-02-07
Accepted:
2024-03-01
Online:
2024-05-30
Published:
2024-12-10
Contact:
Jianqiang HUANG
About author:
LI Bo, born in 1998, M. S. candidate. His research interests include high performance computing.Supported by:
通讯作者:
黄建强
作者简介:
李博(1998—),男,山东菏泽人,硕士研究生,CCF会员,主要研究方向:高性能计算基金资助:
CLC Number:
Bo LI, Jianqiang HUANG, Dongqiang HUANG, Xiaoying WANG. Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms[J]. Journal of Computer Applications, 2024, 44(12): 3867-3875.
李博, 黄建强, 黄东强, 王晓英. 基于异构平台的稀疏矩阵向量乘自适应计算优化[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3867-3875.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023111707
特征参数 | 参数描述 | 时间复杂度 |
---|---|---|
nnz | 矩阵总非零元素数量 | O(1) |
m | 矩阵行数 | O(1) |
n | 矩阵列数 | O(1) |
max_nnz | 所有行中的非零元素数的最大值 | O(nnz) |
min_nnz | 所有行中的非零元素数的最小值 | O(nnz) |
mid_nnz | 组宽(max_nnz-min_nnz)/m | O(1) |
row_mu | 平均每行非零元素 | O(1) |
col_mu | 平均每列非零元素 | O(1) |
sd | 每行非零元素的标准偏差 | O(nnz) |
Rsd | 矩阵的变异系数sd/row_mu | O(1) |
Tab. 1 Selected feature parameters
特征参数 | 参数描述 | 时间复杂度 |
---|---|---|
nnz | 矩阵总非零元素数量 | O(1) |
m | 矩阵行数 | O(1) |
n | 矩阵列数 | O(1) |
max_nnz | 所有行中的非零元素数的最大值 | O(nnz) |
min_nnz | 所有行中的非零元素数的最小值 | O(nnz) |
mid_nnz | 组宽(max_nnz-min_nnz)/m | O(1) |
row_mu | 平均每行非零元素 | O(1) |
col_mu | 平均每列非零元素 | O(1) |
sd | 每行非零元素的标准偏差 | O(nnz) |
Rsd | 矩阵的变异系数sd/row_mu | O(1) |
Pearson系数 | 相关性判断 |
---|---|
(0.8,1.0] | 极强相关 |
(0.6,0.8] | 强相关 |
(0.4,0.6] | 中等相关 |
(0.2,0.4] | 弱相关 |
(0.0,0.2] | 极弱相关 |
Tab. 2 Pearson determining correlation
Pearson系数 | 相关性判断 |
---|---|
(0.8,1.0] | 极强相关 |
(0.6,0.8] | 强相关 |
(0.4,0.6] | 中等相关 |
(0.2,0.4] | 弱相关 |
(0.0,0.2] | 极弱相关 |
模型种类 | 不使用网格化搜索 | 使用网格化搜索 |
---|---|---|
XGBoost | 85 | 89 |
LightGBM | 83 | 87 |
Tab. 3 Accuracy of model classification
模型种类 | 不使用网格化搜索 | 使用网格化搜索 |
---|---|---|
XGBoost | 85 | 89 |
LightGBM | 83 | 87 |
性能指标 | cuSPARSE | 自适应 计算优化 | ||
---|---|---|---|---|
CSR | HYB | ELL | ||
GFlops | 20.5 | 13.7 | 11.0 | 28.7 |
加速比 | 1.4 | 2.1 | 2.6 | 1.0 |
Tab. 4 Average GFlops and speedup ratio
性能指标 | cuSPARSE | 自适应 计算优化 | ||
---|---|---|---|---|
CSR | HYB | ELL | ||
GFlops | 20.5 | 13.7 | 11.0 | 28.7 |
加速比 | 1.4 | 2.1 | 2.6 | 1.0 |
稀疏矩阵 | 矩阵维度 | 总非零元数 | 平均行 非零元数 |
---|---|---|---|
HB/shercuman2 | 1 080×1 080 | 23 094 | 21.38 |
SNAP/ sx-askubuntu | 159 316×159 316 | 3.75 | |
MathWorks/Sieber | 2 290×2 290 | 6.49 | |
DIMACS10/delaunay_n11 | 2 048×2 048 | 12 254 | 5.98 |
HB/nos4 | 100×100 | 594 | 5.94 |
Tab. 5 Sparse matrix parameters
稀疏矩阵 | 矩阵维度 | 总非零元数 | 平均行 非零元数 |
---|---|---|---|
HB/shercuman2 | 1 080×1 080 | 23 094 | 21.38 |
SNAP/ sx-askubuntu | 159 316×159 316 | 3.75 | |
MathWorks/Sieber | 2 290×2 290 | 6.49 | |
DIMACS10/delaunay_n11 | 2 048×2 048 | 12 254 | 5.98 |
HB/nos4 | 100×100 | 594 | 5.94 |
存储格式 | CSR | ELL | HYB | BCSR |
---|---|---|---|---|
P100 | 1.1 | 2.0 | 1.7 | 2.4 |
RTX A500 | 1.7 | 1.8 | 2.8 | 2.2 |
Tab. 6 Speedup ratios under other two architectures
存储格式 | CSR | ELL | HYB | BCSR |
---|---|---|---|---|
P100 | 1.1 | 2.0 | 1.7 | 2.4 |
RTX A500 | 1.7 | 1.8 | 2.8 | 2.2 |
1 | LI K, YANG W, LI K. Performance analysis and optimization for SpMV on GPU using probabilistic modeling [J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(1): 196-205. |
2 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [EB/OL]. [2023-11-20].. |
3 | CHEN J, MA T, XIAO C. FastGCN: fast learning with graph convolutional networks via importance sampling[EB/OL]. (2018-01-30) [2023-11-20].. |
4 | 盛伟,王保云,何苗,等. 基于评分相似性的群稀疏矩阵分解推荐算法[J]. 计算机应用, 2017, 37(5): 1397-1401. |
SHENG W, WANG B Y, HE M, et al. Score similarity based matrix factorization recommendation algorithm with group sparsity[J]. Journal of Computer Applications, 2017, 37(5): 1397-1401. | |
5 | MUHAMMED T, MEHMOOD R, ALBESHRI A, et al. SURAA: a novel method and tool for load balanced and coalesced SpMV computations on GPUs[J]. Applied Sciences, 2019, 9(5): No.947. |
6 | CUI H, WANG N, WANG Y, et al. An effective SPMV based on block strategy and hybrid compression on GPU [J]. The Journal of Supercomputing, 2022, 78(5): 6318-6339. |
7 | CHU G, HE Y, DONG L, et al. Efficient algorithm design of optimizing SpMV on GPU[C]// Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing. New York: ACM, 2023: 115-128. |
8 | DU Y, HU Y, ZHOU Z, et al. High-performance sparse linear algebra on HBM-equipped FPGAs using HLS: a case study on SpMV[C]// Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM, 2022: 54-64. |
9 | AHMED M, USMAN S, SHAH N A, et al. AAQAL: a machine learning-based tool for performance optimization of parallel SPMV computations using block CSR[J]. Applied Sciences, 2022, 12(14): No.7073. |
10 | GUO P, ZHANG C. Performance prediction for CSR-based SpMV on GPUs using machine learning [C]// Proceedings of the IEEE 4th International Conference on Computer and Communications. Piscataway: IEEE, 2018: 1956-1960. |
11 | GROSSMAN M, THIELE C, ARAYA-POLO M, et al. A survey of sparse matrix-vector multiplication performance on large matrices[EB/OL]. [2023-11-20].. |
12 | NIU Y, LU Z, DONG M, et al. TileSpMV: a tiled algorithm for sparse matrix-vector multiplication on GPUs [C]// Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2021: 68-78. |
13 | LI Y, XIE P, CHEN X, et al. VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors[J]. The Journal of Supercomputing, 2020, 76(3): 2063-2081. |
14 | BIAN H, HUANG J, DONG R, et al. CSR2: a new format for SIMD-accelerated SpMV [C]// Proceedings of the 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing. Piscataway: IEEE, 2020: 350-359. |
15 | LIU W, VINTER B. CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication [C]// Proceedings of the 29th ACM International Conference on Supercomputing. New York: ACM, 2015: 339-350. |
16 | KOURTIS K, KARAKASIS V, GOUMAS G, et al. CSX: an extended compression format for SpMV on shared memory systems[J]. ACM SIGPLAN Notices, 2011, 46(8): 247-256. |
17 | BENATIA A, JI W, WANG Y, et al. BestSF: a sparse meta-format for optimizing SpMV on GPU [J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(3): No.29. |
18 | OLIVER J, AYGUADÉE, MARTORELL X, et al. b8c: an FPGA-friendly sparse matrix representation suitable for the SpMV kernel [C]// Proceedings of the 9th BSC Doctoral Symposium. [S.l.]: Barcelona Supercomputing Center, 2022: 70-71. |
19 | PAGE B A, KOGGE P M. Scalability of Sparse Matrix dense Vector multiply (SpMV) on a migrating thread architecture [C]// Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops. Piscataway: IEEE, 2020: 483-488. |
20 | MERRILL D, GARLAND M. Merge-based Sparse Matrix-Vector multiplication (SpMV) using the CSR storage format [J]. ACM SIGPLAN Notices, 2016, 51(8): No.43. |
21 | GUO P, WANG L. Accurate cross-architecture performance modeling for Sparse Matrix-Vector multiplication (SpMV) on GPUs[J]. Concurrency and Computation: Practice and Experience, 2015, 27(13): 3281-3294. |
22 | 张健飞,沈德飞. 基于GPU的稀疏线性系统的预条件共轭梯度法[J]. 计算机应用, 2013, 33(3): 825-829. |
ZHANG J F, SHEN D F. GPU-based preconditioned conjugate gradient method for solving sparse linear systems[J]. Journal of Computer Applications, 2013, 33(3): 825-829. | |
23 | 杨鹏,赵辉,鲍忠贵. 基于双十字链表存储的共享资源矩阵方法特性研究[J]. 计算机应用, 2016, 36(3): 653-656. |
YANG P, ZHAO H, BAO Z G. Research on properties of shared resource matrix method based on double cross linked list [J]. Journal of Computer Applications, 2016, 36(3): 653-656. | |
24 | FURUHATA R, ZHAO M, AGUNG M, et al. Improving the accuracy in SpMV implementation selection with machine learning[C]// Proceedings of the 8th International Symposium on Computing and Networking Workshops. Piscataway: IEEE, 2020: 172-177. |
25 | ANZT H, TOMOV S, DONGARRA J. On the performance and energy efficiency of sparse linear algebra on GPUs [J]. The International Journal of High Performance Computing Applications, 2017, 31(5): 375-390. |
26 | DAGA M, GREATHOUSE J L. Structural agnostic SpMV: adapting CSR-adaptive for irregular matrices [C]// Proceedings of the IEEE 22nd International Conference on High Performance Computing. Piscataway: IEEE, 2015: 64-74. |
27 | DANG H V, SCHMIDT B. The sliced COO format for sparse matrix-vector multiplication on CUDA-enabled GPUs [J]. Procedia Computer Science, 2012, 9: 57-66. |
28 | ZHANG Y, YANG W, LI K, et al. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor[J]. Journal of Parallel and Distributed Computing, 2021, 158: 126-137. |
29 | KARAKASIS V, GOUMAS G, KOZIRIS N. Exploring the effect of block shapes on the performance of sparse kernels[C]// Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing. Piscataway: IEEE, 2009: 1-8. |
30 | PAGE B A, KOGGE P M. Scalability of hybrid Sparse Matrix dense Vector (SpMV) multiplication [C]// Proceedings of the 2018 International Conference on High Performance Computing and Simulation. Piscataway: IEEE, 2018: 406-414. |
31 | KOLODZIEJ S P, AZNAVEH M, BULLOCK M, et al. The SuiteSparse matrix collection website interface [J]. Journal of Open Source Software, 2019, 4(35): 1244. |
32 | BENESTY J, CHEN J, HUANG Y, et al. Pearson correlation coefficient [M]// COHEN I, HUANG Y, CHEN J, et al. Noise reduction in speech processing. Berlin: Springer, 2009: 1-4. |
33 | 惠子薇,何坤,冯犇,等. 基于视觉特性的图像质量评价[J]. 计算机工程, 2023, 49(7): 189-195. |
HUI Z W, HE K, FENG B, et al. Image quality assessment based on visual characteristics[J]. Computer Engineering, 2023, 49(7): 189-195. | |
34 | LIANG W, LUO S, ZHAO G, et al. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms [J]. Mathematics, 2020, 8(5): No.765. |
35 | CHEN T, HE T, BENESTY M, et al. XGBoost: extreme gradient boosting [EB/OL]. [2023-12-02].. |
36 | KE G, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 3149-3157. |
37 | SHUAI Y, ZHENG Y, HUANG H. Hybrid software obsolescence evaluation model based on PCA-SVM-GridSearchCV[C]// Proceedings of the IEEE 9th International Conference on Software Engineering and Service Science. Piscataway: IEEE, 2018: 449-453. |
38 | CAO T T, TANG K, MOHAMED A, et al. Parallel banding algorithm to compute exact distance transform with the GPU [C]// Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. New York: ACM, 2010: 83-90. |
[1] | Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436. |
[2] | Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789. |
[3] | Mingyao SHEN, Meng HAN, Shiyu DU, Rui SUN, Chunyan ZHANG. Data center server energy consumption optimization algorithm combining XGBoost and Multi-GRU [J]. Journal of Computer Applications, 2022, 42(1): 198-208. |
[4] | Yinxin BAO, Yang CAO, Quan SHI. Improved spatio-temporal residual convolutional neural network for urban road network short-term traffic flow prediction [J]. Journal of Computer Applications, 2022, 42(1): 258-264. |
[5] | ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293. |
[6] | CHEN Lang, WANG Rangding, YAN Diqun, LIN Yuzhen. Audio steganography detection model combing residual network and extreme gradient boosting [J]. Journal of Computer Applications, 2021, 41(2): 449-455. |
[7] | GU Tong, XU Guoliang, LI Wanlin, LI Jiahao, WANG Zhiyuan, LUO Jiangtao. Intelligent house price evaluation model based on ensemble LightGBM and Bayesian optimization strategy [J]. Journal of Computer Applications, 2020, 40(9): 2762-2767. |
[8] | CHEN Xiaonan, HU Jianmin, CHEN Xi, ZHANG Wei. Simulation and effectiveness evaluation of network warfare based on LightGBM algorithm [J]. Journal of Computer Applications, 2020, 40(7): 2003-2008. |
[9] | GUO Maozu, ZHANG Bin, ZHAO Lingling, ZHANG Yu. Activity semantic recognition method based on joint features and XGBoost [J]. Journal of Computer Applications, 2020, 40(11): 3159-3165. |
[10] | YE Zhiyu, FENG Aimin, GAO Hang. Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model [J]. Journal of Computer Applications, 2019, 39(12): 3434-3439. |
[11] | HAO Dehua, GUAN Weiguo, ZOU Linjie, JIAO Meng. Fast virtual grid matching localization algorithm based on Pearson correlation coefficient [J]. Journal of Computer Applications, 2018, 38(3): 763-768. |
[12] | MO Zan, ZHAO Bing, HUANG Yanying. Network public opinion prediction by empirical mode decomposition-autoregression based on extreme gradient boosting model [J]. Journal of Computer Applications, 2018, 38(3): 615-619. |
[13] | LIU Ying, ZHANG Tao, LI Kun, LI Nan. Evaluation model of mobile application crowdsourcing testers [J]. Journal of Computer Applications, 2017, 37(12): 3569-3573. |
[14] | GAO Yuan, WANG Shumin, SUN Jianfei. Node mobility model based on user interest similarity [J]. Journal of Computer Applications, 2015, 35(9): 2457-2460. |
[15] | . Research of matrix sparsity for collaborative filtering [J]. Journal of Computer Applications, 2010, 30(4): 1079-1082. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||