Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms

doi:10.11772/j.issn.1001-9081.2023111707

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (12): 3867-3875.DOI: 10.11772/j.issn.1001-9081.2023111707

• Advanced computing • Previous Articles Next Articles

Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms

Bo LI¹^,², Jianqiang HUANG¹^,²(), Dongqiang HUANG¹^,², Xiaoying WANG¹

^1.Department of Computer Technology and Applications，Qinghai University，Xining Qinghai 810016，China
^2.Qinghai Provincial Laboratory of Intelligent Computing and Applications （Qinghai University），Xining Qinghai 810016，China

Received:2023-12-11 Revised:2024-02-07 Accepted:2024-03-01 Online:2024-05-30 Published:2024-12-10
Contact: Jianqiang HUANG
About author:LI Bo， born in 1998， M. S. candidate. His research interests include high performance computing.
HUANG Dongqiang， born in 1998， M. S. candidate. His research interests include high performance computing.
WANG Xiaoying， born in 1982， Ph. D.， professor. Her research interests include smart grid， high performance computing， computer architecture.
Supported by:
Applied Basis Research Program of Qinghai Province(2022-ZJ-701);National Natural Science Foundation of China(62062059)

基于异构平台的稀疏矩阵向量乘自适应计算优化

李博¹^,², 黄建强¹^,²(), 黄东强¹^,², 王晓英¹

^1.青海大学计算机技术与应用系，西宁 810016
^2.青海省智能计算与应用实验室（青海大学），西宁 810016

通讯作者: 黄建强
作者简介:李博（1998—），男，山东菏泽人，硕士研究生，CCF会员，主要研究方向：高性能计算
黄东强（1998—），男，福建莆田人，硕士研究生，CCF会员，主要研究方向：高性能计算
王晓英（1982—），女，吉林双辽人，教授，博士，CCF高级会员，主要研究方向：智能电网、高性能计算、计算机体系结构。
基金资助:
青海省应用基础研究计划项目(2022?ZJ?701);国家自然科学基金资助项目(62062059)

Abstract

Abstract:

Sparse Matrix-Vector multiplication （SpMV） is an important numerical linear algebraic operation. The existing optimizations for SpMV suffer from issues such as incomplete consideration of preprocessing and communication time， lack of universality in storage structures. To address these issues， an adaptive optimization scheme for SpMV on heterogeneous platforms was proposed. In the proposed scheme， the Pearson correlation coefficients were utilized to determine highly correlated feature parameters， and two Gradient Boosting Decision Tree （GBDT） based algorithms eXtreme Gradient Boosting （XGBoost） and Light Gradient Boosting Machine （LightGBM） were employed to train prediction models to determine the optimal storage format for a certain sparse matrix. The use of grid searches to identify better model hyperparameters for model training resulted in both of those algorithms achieving more than 85% accuracy in selecting a more suitable storage structure. Furthermore， for sparse matrices with the HYBrid （HYB） storage format， the ELLPACK （ELL） and COOrdinate （COO） storage format parts in these metrices were computed on the GPU and CPU separately， establishing a CPU+GPU parallel hybrid computing mode. At the same time， hardware platforms were also selected for sparse matrices with small data sizes to improve computational speed. Experimental results demonstrate that the adaptive computing optimization achieves an average speedup of 1.4 compared to the Compressed Sparse Row （CSR） storage format in cuSPARSE library， and average speedup of 2.1 and 2.6 compared to the HYB and ELL storage formats， respectively.

Key words: Sparse Matrix-Vector multiplication (SpMV), adaptive optimization, Pearson correlation coefficient, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM)

摘要：

稀疏矩阵向量乘（SpMV）是一种重要的数值线性代数运算，现有的优化存在预处理及通信时间考虑不全面、存储结构不具有普适性等问题。为了解决这些问题，提出异构平台下SpMV的自适应优化方案。所提方案利用皮尔逊相关系数确定相关度高的特征参数，并使用基于梯度提升决策树（GBDT）的极端梯度提升（XGBoost）和轻量级梯度提升（LightGBM）算法训练预测模型，以确定某一稀疏矩阵更优的存储格式。利用网格搜索确定模型训练时更优的模型超参数，使这2种算法选择更适合的存储结构的准确率都超过85%。此外，对于预测存储结构为混合（HYB）格式的稀疏矩阵，在GPU和CPU上分别计算其中的等长列（ELL）与坐标（COO）存储格式部分，建立基于CPU+GPU的并行混合计算模式；同时为小数据量的稀疏矩阵选择硬件平台，提高运算速度。实验结果表明，自适应计算优化相较于cuSPARSE库中的压缩稀疏行（CSR）存储格式计算的平均加速比可以达到1.4，相较于按照HYB和ELL存储格式计算的平均加速比则可以分别达到2.1和2.6。

关键词: 稀疏矩阵向量乘, 自适应优化, 皮尔逊相关系数, 极端梯度提升, 轻量级梯度提升机器学习

CLC Number:

TP311.1

Bo LI, Jianqiang HUANG, Dongqiang HUANG, Xiaoying WANG. Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms[J]. Journal of Computer Applications, 2024, 44(12): 3867-3875.

李博, 黄建强, 黄东强, 王晓英. 基于异构平台的稀疏矩阵向量乘自适应计算优化[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3867-3875.

Figures/Tables 18

Fig. 1 Proportion of optimal storage formats

Tab. 1 Selected feature parameters

特征参数	参数描述	时间复杂度
nnz	矩阵总非零元素数量	O（1）
m	矩阵行数	O（1）
n	矩阵列数	O（1）
max_nnz	所有行中的非零元素数的最大值	O（nnz）
min_nnz	所有行中的非零元素数的最小值	O（nnz）
mid_nnz	组宽（max_nnz-min_nnz）/m	O（1）
row_mu	平均每行非零元素	O（1）
col_mu	平均每列非零元素	O（1）
sd	每行非零元素的标准偏差	O（nnz）
Rsd	矩阵的变异系数sd/row_mu	O（1）

Tab. 2 Pearson determining correlation

Pearson系数	相关性判断
（0.8，1.0］	极强相关
（0.6，0.8］	强相关
（0.4，0.6］	中等相关
（0.2，0.4］	弱相关
（0.0，0.2］	极弱相关

Fig. 2 Pearson for each storage format corresponding to the first five feature parameters

Fig. 3 Pearson for each storage format corresponding to the last five feature parameters

Fig. 4 Ratio of time required for non-optimal formats to optimal format at different stages

Fig. 5 Model training process

Tab. 3 Accuracy of model classification

模型种类	不使用网格化搜索	使用网格化搜索
XGBoost	85	89
LightGBM	83	87

Fig. 6 Calculating storage formats with larger proportion of GFlops on GPU and CPU

Fig. 7 Calculating storage formats with larger proportion of GFlops on P100 and CPU

Fig. 8 Calculating storage formats with larger proportion of GFlops on RTX A5000 and CPU

Fig. 9 SpMV adaptive computing optimization process based on heterogeneous platforms

Tab. 4 Average GFlops and speedup ratio

性能指标

cuSPARSE

自适应

计算优化

Fig. 10 Performance under different computing methods

Tab. 5 Sparse matrix parameters

稀疏矩阵	矩阵维度	总非零元数	平均行非零元数
HB/shercuman2	1 080×1 080	23 094	21.38
SNAP/ sx-askubuntu	159 316×159 316	596 933	3.75
MathWorks/Sieber	2 290×2 290	14 873	6.49
DIMACS10/delaunay_n11	2 048×2 048	12 254	5.98
HB/nos4	100×100	594	5.94

Fig. 11 GFlops of sparse matrices under different computing methods

Fig. 12 Proportions of time spent in different stages to total time

Tab. 6 Speedup ratios under other two architectures

存储格式	CSR	ELL	HYB	BCSR
P100	1.1	2.0	1.7	2.4
RTX A500	1.7	1.8	2.8	2.2

References 38

1	LI K， YANG W， LI K. Performance analysis and optimization for SpMV on GPU using probabilistic modeling ［J］. IEEE Transactions on Parallel and Distributed Systems， 2015， 26（1）： 196-205.
2	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks ［EB/OL］. ［2023-11-20］..
3	CHEN J， MA T， XIAO C. FastGCN： fast learning with graph convolutional networks via importance sampling［EB/OL］. （2018-01-30）［2023-11-20］..
4	盛伟，王保云，何苗，等. 基于评分相似性的群稀疏矩阵分解推荐算法［J］. 计算机应用， 2017， 37（5）： 1397-1401.
	SHENG W， WANG B Y， HE M， et al. Score similarity based matrix factorization recommendation algorithm with group sparsity［J］. Journal of Computer Applications， 2017， 37（5）： 1397-1401.
5	MUHAMMED T， MEHMOOD R， ALBESHRI A， et al. SURAA： a novel method and tool for load balanced and coalesced SpMV computations on GPUs［J］. Applied Sciences， 2019， 9（5）： No.947.
6	CUI H， WANG N， WANG Y， et al. An effective SPMV based on block strategy and hybrid compression on GPU ［J］. The Journal of Supercomputing， 2022， 78（5）： 6318-6339.
7	CHU G， HE Y， DONG L， et al. Efficient algorithm design of optimizing SpMV on GPU［C］// Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing. New York： ACM， 2023： 115-128.
8	DU Y， HU Y， ZHOU Z， et al. High-performance sparse linear algebra on HBM-equipped FPGAs using HLS： a case study on SpMV［C］// Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York： ACM， 2022： 54-64.
9	AHMED M， USMAN S， SHAH N A， et al. AAQAL： a machine learning-based tool for performance optimization of parallel SPMV computations using block CSR［J］. Applied Sciences， 2022， 12（14）： No.7073.
10	GUO P， ZHANG C. Performance prediction for CSR-based SpMV on GPUs using machine learning ［C］// Proceedings of the IEEE 4th International Conference on Computer and Communications. Piscataway： IEEE， 2018： 1956-1960.
11	GROSSMAN M， THIELE C， ARAYA-POLO M， et al. A survey of sparse matrix-vector multiplication performance on large matrices［EB/OL］. ［2023-11-20］..
12	NIU Y， LU Z， DONG M， et al. TileSpMV： a tiled algorithm for sparse matrix-vector multiplication on GPUs ［C］// Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium. Piscataway： IEEE， 2021： 68-78.
13	LI Y， XIE P， CHEN X， et al. VBSF： a new storage format for SIMD sparse matrix-vector multiplication on modern processors［J］. The Journal of Supercomputing， 2020， 76（3）： 2063-2081.
14	BIAN H， HUANG J， DONG R， et al. CSR2： a new format for SIMD-accelerated SpMV ［C］// Proceedings of the 20th IEEE/ACM International Symposium on Cluster， Cloud and Internet Computing. Piscataway： IEEE， 2020： 350-359.
15	LIU W， VINTER B. CSR5： an efficient storage format for cross-platform sparse matrix-vector multiplication ［C］// Proceedings of the 29th ACM International Conference on Supercomputing. New York： ACM， 2015： 339-350.
16	KOURTIS K， KARAKASIS V， GOUMAS G， et al. CSX： an extended compression format for SpMV on shared memory systems［J］. ACM SIGPLAN Notices， 2011， 46（8）： 247-256.
17	BENATIA A， JI W， WANG Y， et al. BestSF： a sparse meta-format for optimizing SpMV on GPU ［J］. ACM Transactions on Architecture and Code Optimization， 2018， 15（3）： No.29.
18	OLIVER J， AYGUADÉE， MARTORELL X， et al. b8c： an FPGA-friendly sparse matrix representation suitable for the SpMV kernel ［C］// Proceedings of the 9th BSC Doctoral Symposium. ［S.l.］： Barcelona Supercomputing Center， 2022： 70-71.
19	PAGE B A， KOGGE P M. Scalability of Sparse Matrix dense Vector multiply （SpMV） on a migrating thread architecture ［C］// Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops. Piscataway： IEEE， 2020： 483-488.
20	MERRILL D， GARLAND M. Merge-based Sparse Matrix-Vector multiplication （SpMV） using the CSR storage format ［J］. ACM SIGPLAN Notices， 2016， 51（8）： No.43.
21	GUO P， WANG L. Accurate cross-architecture performance modeling for Sparse Matrix-Vector multiplication （SpMV） on GPUs［J］. Concurrency and Computation： Practice and Experience， 2015， 27（13）： 3281-3294.
22	张健飞，沈德飞. 基于GPU的稀疏线性系统的预条件共轭梯度法［J］. 计算机应用， 2013， 33（3）： 825-829.
	ZHANG J F， SHEN D F. GPU-based preconditioned conjugate gradient method for solving sparse linear systems［J］. Journal of Computer Applications， 2013， 33（3）： 825-829.
23	杨鹏，赵辉，鲍忠贵. 基于双十字链表存储的共享资源矩阵方法特性研究［J］. 计算机应用， 2016， 36（3）： 653-656.
	YANG P， ZHAO H， BAO Z G. Research on properties of shared resource matrix method based on double cross linked list ［J］. Journal of Computer Applications， 2016， 36（3）： 653-656.
24	FURUHATA R， ZHAO M， AGUNG M， et al. Improving the accuracy in SpMV implementation selection with machine learning［C］// Proceedings of the 8th International Symposium on Computing and Networking Workshops. Piscataway： IEEE， 2020： 172-177.
25	ANZT H， TOMOV S， DONGARRA J. On the performance and energy efficiency of sparse linear algebra on GPUs ［J］. The International Journal of High Performance Computing Applications， 2017， 31（5）： 375-390.
26	DAGA M， GREATHOUSE J L. Structural agnostic SpMV： adapting CSR-adaptive for irregular matrices ［C］// Proceedings of the IEEE 22nd International Conference on High Performance Computing. Piscataway： IEEE， 2015： 64-74.
27	DANG H V， SCHMIDT B. The sliced COO format for sparse matrix-vector multiplication on CUDA-enabled GPUs ［J］. Procedia Computer Science， 2012， 9： 57-66.
28	ZHANG Y， YANG W， LI K， et al. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor［J］. Journal of Parallel and Distributed Computing， 2021， 158： 126-137.
29	KARAKASIS V， GOUMAS G， KOZIRIS N. Exploring the effect of block shapes on the performance of sparse kernels［C］// Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing. Piscataway： IEEE， 2009： 1-8.
30	PAGE B A， KOGGE P M. Scalability of hybrid Sparse Matrix dense Vector （SpMV） multiplication ［C］// Proceedings of the 2018 International Conference on High Performance Computing and Simulation. Piscataway： IEEE， 2018： 406-414.
31	KOLODZIEJ S P， AZNAVEH M， BULLOCK M， et al. The SuiteSparse matrix collection website interface ［J］. Journal of Open Source Software， 2019， 4（35）： 1244.
32	BENESTY J， CHEN J， HUANG Y， et al. Pearson correlation coefficient ［M］// COHEN I， HUANG Y， CHEN J， et al. Noise reduction in speech processing. Berlin： Springer， 2009： 1-4.
33	惠子薇，何坤，冯犇，等. 基于视觉特性的图像质量评价［J］. 计算机工程， 2023， 49（7）： 189-195.
	HUI Z W， HE K， FENG B， et al. Image quality assessment based on visual characteristics［J］. Computer Engineering， 2023， 49（7）： 189-195.
34	LIANG W， LUO S， ZHAO G， et al. Predicting hard rock pillar stability using GBDT， XGBoost， and LightGBM algorithms ［J］. Mathematics， 2020， 8（5）： No.765.
35	CHEN T， HE T， BENESTY M， et al. XGBoost： extreme gradient boosting ［EB/OL］. ［2023-12-02］..
36	KE G， MENG Q， FINLEY T， et al. LightGBM： a highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 3149-3157.
37	SHUAI Y， ZHENG Y， HUANG H. Hybrid software obsolescence evaluation model based on PCA-SVM-GridSearchCV［C］// Proceedings of the IEEE 9th International Conference on Software Engineering and Service Science. Piscataway： IEEE， 2018： 449-453.
38	CAO T T， TANG K， MOHAMED A， et al. Parallel banding algorithm to compute exact distance transform with the GPU ［C］// Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. New York： ACM， 2010： 83-90.

[1]	Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436.
[2]	Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789.
[3]	Mingyao SHEN, Meng HAN, Shiyu DU, Rui SUN, Chunyan ZHANG. Data center server energy consumption optimization algorithm combining XGBoost and Multi-GRU [J]. Journal of Computer Applications, 2022, 42(1): 198-208.
[4]	Yinxin BAO, Yang CAO, Quan SHI. Improved spatio-temporal residual convolutional neural network for urban road network short-term traffic flow prediction [J]. Journal of Computer Applications, 2022, 42(1): 258-264.
[5]	ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293.
[6]	CHEN Lang, WANG Rangding, YAN Diqun, LIN Yuzhen. Audio steganography detection model combing residual network and extreme gradient boosting [J]. Journal of Computer Applications, 2021, 41(2): 449-455.
[7]	GU Tong, XU Guoliang, LI Wanlin, LI Jiahao, WANG Zhiyuan, LUO Jiangtao. Intelligent house price evaluation model based on ensemble LightGBM and Bayesian optimization strategy [J]. Journal of Computer Applications, 2020, 40(9): 2762-2767.
[8]	CHEN Xiaonan, HU Jianmin, CHEN Xi, ZHANG Wei. Simulation and effectiveness evaluation of network warfare based on LightGBM algorithm [J]. Journal of Computer Applications, 2020, 40(7): 2003-2008.
[9]	GUO Maozu, ZHANG Bin, ZHAO Lingling, ZHANG Yu. Activity semantic recognition method based on joint features and XGBoost [J]. Journal of Computer Applications, 2020, 40(11): 3159-3165.
[10]	YE Zhiyu, FENG Aimin, GAO Hang. Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model [J]. Journal of Computer Applications, 2019, 39(12): 3434-3439.
[11]	HAO Dehua, GUAN Weiguo, ZOU Linjie, JIAO Meng. Fast virtual grid matching localization algorithm based on Pearson correlation coefficient [J]. Journal of Computer Applications, 2018, 38(3): 763-768.
[12]	MO Zan, ZHAO Bing, HUANG Yanying. Network public opinion prediction by empirical mode decomposition-autoregression based on extreme gradient boosting model [J]. Journal of Computer Applications, 2018, 38(3): 615-619.
[13]	LIU Ying, ZHANG Tao, LI Kun, LI Nan. Evaluation model of mobile application crowdsourcing testers [J]. Journal of Computer Applications, 2017, 37(12): 3569-3573.
[14]	GAO Yuan, WANG Shumin, SUN Jianfei. Node mobility model based on user interest similarity [J]. Journal of Computer Applications, 2015, 35(9): 2457-2460.
[15]	. Research of matrix sparsity for collaborative filtering [J]. Journal of Computer Applications, 2010, 30(4): 1079-1082.

Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms

基于异构平台的稀疏矩阵向量乘自适应计算优化

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 38

Related Articles 15

Recommended Articles

Metrics