Journal of Computer Applications

    Next Articles

CNN pruning and quantization compression method for reconfigurable structures

  

  • Received:2025-09-11 Revised:2025-11-17 Online:2025-12-01 Published:2025-12-01

面向可重构结构的CNN剪枝与量化压缩方法

张一心,蒋林,李远成,纪辰   

  1. 西安科技大学
  • 通讯作者: 蒋林
  • 基金资助:
    新一代人工智能国家科技重大专项 面向复杂场景的自重构自演化AI芯片研制及应用;可重构结构下复杂应用自适应划分及推测并行加速方法;基于片上光网络互连自动热感知技术研究

Abstract: Abstract: To address the limitations of memory access overhead, redundant computation, and efficient deployment caused by the large parameter size of convolutional neural networks, a CNN joint compression method for reconfigurable structures was proposed. Network characteristics and hardware deployment requirements were integrated, and pruning and quantization were jointly optimized. A convolutional layer pruning strategy based on feature similarity was introduced, in which feature evaluation, clustering, similarity measurement, and redundancy removal were performed to identify low-contribution and redundant filters, while progressive threshold pruning was applied to fully connected layers. Layer sensitivity indices were constructed using Hessian traces, and layer-wise precision was adaptively allocated under a bit-width budget. An optimized deployment scheme was further designed for reconfigurable structures. Experiments on the CIFAR-10 dataset showed that compression ratios of 16.2× and 8.38× were achieved for VGG16 and ResNet18, respectively, surpassing the 13.9× ratio of APQ. Relative to a pruning baseline using fixed 16-bit precision, the inference latency of pruned VGG16 on a self reconfigurable and evolvable AI chip was reduced from 23.3 ms to 9.1 ms after applying the deployment scheme, achieving a 2.56× speedup. Storage and transmission costs were reduced while classification accuracy was maintained, and deployment efficiency and computational performance on edge devices were improved.

Key words: Keywords: reconfigurable structure, convolutional neural network(CNN), model compression, structured pruning, adaptive quantization, artificial intelligence chip

摘要: 针对卷积神经网络参数规模大导致的访存开销、冗余计算及高效部署受限等问题,提出一种面向可重构结构的CNN联合压缩方法,结合网络结构特性与硬件部署需求,从剪枝与量化两维度协同优化。首先,提出基于特征相似的卷积层剪枝策略,依次经过特征信息评估、聚类分组、相似度计算和冗余筛选,筛选低贡献及冗余滤波器;在全连接层采用渐进式阈值剪枝压缩冗余权重;其次,量化部分利用Hessian迹构建层敏感度指标,在位宽预算下自适应分配各层精度;最后,结合可重构结构特性,提出优化部署方案。实验结果表明,在CIFAR-10数据集上,本文方法对VGG16与ResNet18分别实现16.2x和8.38x的压缩,相较于APQ的13.9x具备更高压缩比;与固定16bit精度的剪枝基线相比,采用本文部署策略后,剪枝后VGG16在自重构自演化AI芯片推理时延由23.3ms降至9.1ms,加速比达到2.56x。所提方法在保证分类准确率的同时降低存储与传输开销,提升了边缘设备部署效率与计算性能。

关键词: 可重构结构, 卷积神经网络, 模型压缩, 结构化剪枝, 自适应量化, 人工智能芯片

CLC Number: