CNN pruning and quantization compression method for reconfigurable structures

doi:10.11772/j.issn.1001-9081.2025081055

Journal of Computer Applications

Received:2025-09-11 Revised:2025-11-17 Online:2025-12-01 Published:2025-12-01

面向可重构结构的CNN剪枝与量化压缩方法

张一心,蒋林,李远成,纪辰

西安科技大学

通讯作者: 蒋林
基金资助:
新一代人工智能国家科技重大专项面向复杂场景的自重构自演化AI芯片研制及应用;可重构结构下复杂应用自适应划分及推测并行加速方法;基于片上光网络互连自动热感知技术研究

Abstract

Abstract: Abstract: To address the limitations of memory access overhead, redundant computation, and efficient deployment caused by the large parameter size of convolutional neural networks, a CNN joint compression method for reconfigurable structures was proposed. Network characteristics and hardware deployment requirements were integrated, and pruning and quantization were jointly optimized. A convolutional layer pruning strategy based on feature similarity was introduced, in which feature evaluation, clustering, similarity measurement, and redundancy removal were performed to identify low-contribution and redundant filters, while progressive threshold pruning was applied to fully connected layers. Layer sensitivity indices were constructed using Hessian traces, and layer-wise precision was adaptively allocated under a bit-width budget. An optimized deployment scheme was further designed for reconfigurable structures. Experiments on the CIFAR-10 dataset showed that compression ratios of 16.2× and 8.38× were achieved for VGG16 and ResNet18, respectively, surpassing the 13.9× ratio of APQ. Relative to a pruning baseline using fixed 16-bit precision, the inference latency of pruned VGG16 on a self reconfigurable and evolvable AI chip was reduced from 23.3 ms to 9.1 ms after applying the deployment scheme, achieving a 2.56× speedup. Storage and transmission costs were reduced while classification accuracy was maintained, and deployment efficiency and computational performance on edge devices were improved.

Key words: Keywords: reconfigurable structure, convolutional neural network(CNN), model compression, structured pruning, adaptive quantization, artificial intelligence chip

摘要： 针对卷积神经网络参数规模大导致的访存开销、冗余计算及高效部署受限等问题，提出一种面向可重构结构的CNN联合压缩方法，结合网络结构特性与硬件部署需求，从剪枝与量化两维度协同优化。首先，提出基于特征相似的卷积层剪枝策略，依次经过特征信息评估、聚类分组、相似度计算和冗余筛选，筛选低贡献及冗余滤波器；在全连接层采用渐进式阈值剪枝压缩冗余权重；其次，量化部分利用Hessian迹构建层敏感度指标，在位宽预算下自适应分配各层精度；最后，结合可重构结构特性，提出优化部署方案。实验结果表明，在CIFAR-10数据集上，本文方法对VGG16与ResNet18分别实现16.2x和8.38x的压缩，相较于APQ的13.9x具备更高压缩比；与固定16bit精度的剪枝基线相比，采用本文部署策略后，剪枝后VGG16在自重构自演化AI芯片推理时延由23.3ms降至9.1ms，加速比达到2.56x。所提方法在保证分类准确率的同时降低存储与传输开销，提升了边缘设备部署效率与计算性能。

关键词: 可重构结构, 卷积神经网络, 模型压缩, 结构化剪枝, 自适应量化, 人工智能芯片

CLC Number:

TN409
TP183

张一心蒋林李远成纪辰. 面向可重构结构的CNN剪枝与量化压缩方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025081055.

[1]	Huihui CHEN, Hongtao SUN, Boliang GUAN, Zhongqing HENG. Chinese character image retrieval algorithm in ancient books based on NetVLAD feature encoding [J]. Journal of Computer Applications, 2026, 46(3): 750-757.
[2]	Yanan LI, Mengyang GUO, Guojun DENG, Yunfeng CHEN, Jianji REN, Yongliang YUAN. Method for life prediction of parallel branching engine based on multi-modal fusion features [J]. Journal of Computer Applications, 2026, 46(1): 305-313.
[3]	Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO. Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition [J]. Journal of Computer Applications, 2025, 45(9): 2838-2847.
[4]	Chao SHI, Yuxin ZHOU, Qian FU, Wanyu TANG, Ling HE, Yuanyuan LI. Action recognition algorithm for ADHD patients using skeleton and 3D heatmap [J]. Journal of Computer Applications, 2025, 45(9): 3036-3044.
[5]	Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG. Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU [J]. Journal of Computer Applications, 2025, 45(8): 2515-2521.
[6]	Jinhao LIN, Chuan LUO, Tianrui LI, Hongmei CHEN. Thoracic disease classification method based on cross-scale attention network [J]. Journal of Computer Applications, 2025, 45(8): 2712-2719.
[7]	Yingjun ZHANG, Weiwei YAN, Binhong XIE, Rui ZHANG, Wangdong LU. Gradient-discriminative and feature norm-driven open-world object detection [J]. Journal of Computer Applications, 2025, 45(7): 2203-2210.
[8]	Yongpeng TAO, Shiqi BAI, Zhengwen ZHOU. Neural architecture search for multi-tissue segmentation using convolutional and transformer-based networks in glioma segmentation [J]. Journal of Computer Applications, 2025, 45(7): 2378-2386.
[9]	Zonghang WU, Dong ZHANG, Guanyu LI. Multimodal fusion recommendation algorithm based on joint self-supervised learning [J]. Journal of Computer Applications, 2025, 45(6): 1858-1868.
[10]	Yufei LONG, Yuchen MOU, Ye LIU. Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning [J]. Journal of Computer Applications, 2025, 45(5): 1372-1378.
[11]	Dan WANG, Wenhao ZHANG, Lijuan PENG. Channel estimation of reconfigurable intelligent surface assisted communication system based on deep learning [J]. Journal of Computer Applications, 2025, 45(5): 1613-1618.
[12]	Baohua YUAN, Jialu CHEN, Huan WANG. Medical image segmentation network integrating multi-scale semantics and parallel double-branch [J]. Journal of Computer Applications, 2025, 45(3): 988-995.
[13]	Haijun GENG, Yun DONG, Zhiguo HU, Haotian CHI, Jing YANG, Xia YIN. Encrypted traffic classification method based on Attention-1DCNN-CE [J]. Journal of Computer Applications, 2025, 45(3): 872-882.
[14]	Dixin WANG, Jiahao WANG, Min LI, Hao CHEN, Guangyao HU, Yu GONG. Abnormal attack detection for underwater acoustic communication network [J]. Journal of Computer Applications, 2025, 45(2): 526-533.
[15]	Hanlin ZHANG, Junlu WANG, Baoyan SONG. Time series event classification method fused with derived features [J]. Journal of Computer Applications, 2025, 45(2): 428-435.