Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (4): 1136-1141.DOI: 10.11772/j.issn.1001-9081.2020071060

Special Issue: 网络空间安全

• Cyber security • Previous Articles     Next Articles

Parallel implementation and analysis of SKINNY encryption algorithm using CUDA

XIE Wenbo1, WEI Yongzhuang1, LIU Zhenghong2   

  1. 1. Guangxi Key Laboratory of Cryptography and Information Security;(Guilin University of Electronic Technology), Guilin Guangxi 541004, China;
    2. Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing;(Guilin University of Electronic Technology), Guilin Guangxi 541004, China
  • Received:2020-07-21 Revised:2020-10-14 Online:2021-04-10 Published:2020-12-23
  • Supported by:
    This work is partially supported by the Guangxi Key Laboratory of Wireless Broadband Communication and Signal Processing Director Fund (GXKL06160112).


解文博1, 韦永壮1, 刘争红2   

  1. 1. 广西密码学与信息安全重点实验室(桂林电子科技大学), 广西 桂林 541004;
    2. 广西无线宽带通信与信号处理重点实验室(桂林电子科技大学), 广西 桂林 541004
  • 通讯作者: 韦永壮
  • 作者简介:解文博(1996—),男,山东枣庄人,硕士研究生,主要研究方向:分组密码算法、GPU并行计算;韦永壮(1976—),男,广西田阳人,教授,博士,主要研究方向:对称密码算法设计与分析;刘争红(1979—),男,湖北红安人,讲师,硕士,主要研究方向:无线宽带通信、FPGA、GPU并行运算。
  • 基金资助:

Abstract: Focusing on the issue of low efficiency of SKINNY encryption algorithm in Central Processing Unit(CPU), a fast implementation method was proposed based on Graphic Processing Unit(GPU). In the first place, an optimization scheme was proposed by combining the structural characteristics of SKINNY algorithm, and one whole calculation, where the whole calculation was integrated by 5 step-by-step operations. Moreover, the characteristics of the Electronic CodeBook(ECB) mode and counter(CTR) mode of this algorithm were analyzed, and the parallel design schemes such as parallel granularity and memory allocation were given. Experimental results illustrate that the efficiency and throughput of SKINNY algorithm implemented by Computing Unified Device Architecture(CUDA) are significantly improved, when compared to the algorithm with the traditional CPU implementation. More specifically, for data size of 16 MB or large size, the SKINNY algorithm implementation with ECB mode achieves maximum efficiency improvement of 99.85% and maximum speedup ratio of 671. On the other hand, the SKINNY algorithm implementation with CTR mode achieves maximum efficiency improvement of 99.87% and maximum speedup ratio of 765. In particular, the throughput of the proposed SKINNY-256(ECB) parallel algorithm has 1.29 times and 2.55 times of those of the existing AES-256(ECB) and SKINNY_ECB parallel algorithms, respectively.

Key words: SKINNY cryption algorithm, parallel computing, Compute Unified Device Architecture (CUDA), Graphic Processing Unit (GPU), Electronic CodeBook (ECB) mode, Counter (CTR) mode

摘要: 针对SKINNY加密算法在中央处理器(CPU)下实现效率偏低的问题,提出一种基于图形处理器(GPU)的快速实现方法。首先,结合SKINNY算法的结构特征提出优化方案,将5个分步操作优化整合为1个整体运算;然后,分析该算法的电子密码本(ECB)模式和计数器(CTR)模式的特性,并给出并行粒度、内存分配等并行设计方案。实验结果表明,与传统的CPU实现方法下的SKINNY算法相比,基于计算统一设备架构(CUDA)实现的SKINNY算法的效率和吞吐量得到很大提升。具体来说,当处理的数据达到16 MB及以上时,在所提实现方法下,SKINNY算法的ECB模式的加速效率提升峰值为99.85%,加速比峰值为671,CTR模式的加速效率提升峰值为99.87%,加速比峰值为765;而与已有AES-256(ECB)和SKINNY_ECB并行算法比较,新提出的SKINNY-256(ECB)并行算法的吞吐量分别是它们的吞吐量的1.29倍和2.55倍。

关键词: SKINNY密码算法, 并行计算, 统一计算架构, 图形处理器, 电子密码本模式, 计数器模式

CLC Number: