面向边缘部署的高分辨率实时语义分割算法

doi:10.11772/j.issn.1001-9081.2024020218

《计算机应用》唯一官方网站 ›› 0, Vol. ›› Issue (): 159-163.DOI: 10.11772/j.issn.1001-9081.2024020218

面向边缘部署的高分辨率实时语义分割算法

曾林隆¹^,², 成苗¹^,²^,³(), 张绍兵¹^,²^,³, 曾渝¹^,²

^1.中国科学院成都计算机应用研究所，成都 610213
^2.中国科学院大学计算机科学与技术学院，北京 100049
^3.深圳市中钞科信金融科技有限公司，广东深圳 518206

收稿日期:2024-03-05 修回日期:2024-03-28 接受日期:2024-04-01 发布日期:2025-01-24 出版日期:2024-12-31
通讯作者: 成苗
作者简介:曾林隆（1998—），男，四川隆昌人，硕士研究生，CCF会员，主要研究方向：机器视觉、人工智能
成苗（1983—），男，四川成都人，高级工程师，硕士，主要研究方向：人工智能、机器视觉
张绍兵（1979—），男，四川成都人，正高级工程师，硕士，主要研究方向：高速图像处理、缺陷检测、深度学习
曾渝（1999—），男，重庆人，硕士研究生，主要研究方向：时间序列分析、数据挖掘。

High-resolution real-time semantic segmentation algorithm for edge deployment

Linlong ZENG¹^,², Miao CHENG¹^,²^,³(), Shaobing ZHANG¹^,²^,³, Yu ZENG¹^,²

^1.Chengdu Institute of Computer Application，Chinese Academy of Sciences，Chengdu Sichuan 610213，China
^2.School of Computer Science and Technology，University of Chinese Academy of Sciences，Beijing 100049，China
^3.Shenzhen CBPM-KEXIN Banking Technology Company Limited，Shenzhen Guangdong 518206，China

Received:2024-03-05 Revised:2024-03-28 Accepted:2024-04-01 Online:2025-01-24 Published:2024-12-31
Contact: Miao CHENG

摘要/Abstract

摘要：

在机器视觉领域经典的任务中，语义分割是计算量较大的一类，使得在边缘计算系统中部署执行分割的卷积神经网络（CNN）比较困难。现场可编程逻辑门阵列（FPGA）是工业视觉传感器中广泛使用的数据流处理硬件，而近年来有研究实现了在FPGA上部署CNN。然而，受限于有限的算力，目前的技术在FPGA上实现高分辨率图像的语义分割时，难以达到可接受的速度和精度。通过分析FPGA上深度学习加速器的特性，提出一种新的分割网络——三分支分割网络（TriSeNet），所提网络能端到端地在边缘加速器上推理高分辨率图像的语义分割任务。将TriSeNet部署到赛灵思Kria K26 SOM上推理CityScapes语义分割时取得了75%的平均交并比（mIoU），同时在输入分辨率为512 $× 1 ? 024$ 时，推理速度达到了32 FPS。TriSeNet能高效利用边缘端的计算资源，实现了62.6%的运算器利用率，表明TriSeNet是一种成功适应加速器硬件特点的模型。

关键词: 边缘计算, 图像分割, 卷积神经网络, 智能计算系统, 现场可编程逻辑门阵列

Abstract:

Among the classic tasks in machine vision， semantic segmentation is a category with a large amount of calculation， making it difficult to deploy Convolutional Neural Networks （CNNs） for segmentation in edge computing systems. Field Programmable Gate Array （FPGA） is a hardware widely used in industrial vision sensors for data stream processing. In recent years， methods for deploying CNNs on FPGA have been proposed. However， due to limited computing resources， current technology cannot achieve acceptable speed and accuracy when performing semantic segmentation of high-resolution images on FPGA. After analyzing the characteristics of deep learning accelerators on FPGA， a new segmentation network， Trilateral Segment Network （TriSeNet）， was proposed to achieve end-to-end inference of semantic segmentation tasks of high-resolution images on edge accelerators. TriSeNet was deployed on Xilinx Kria K26 SOM to process CityScapes semantic segmentation. TriSeNet achieved a mean Intersection over Union （mIoU） of 75%； for images with resolution of 512*1 024，it had a inference speed of 32 FPS. It could utilize computing resources at the edge efficiently， and achieved a calculator utilization of 62.6%. It is verified that TriSeNet is a model adapting to hardware characteristics of the accelerator successfully.

Key words: edge computing, image segmentation, Convolutional Neural Network (CNN), intelligent computing system, Field Programmable Gate Array (FPGA)

中图分类号:

TP391

曾林隆, 成苗, 张绍兵, 曾渝. 面向边缘部署的高分辨率实时语义分割算法[J]. 计算机应用, 0, (): 159-163.

Linlong ZENG, Miao CHENG, Shaobing ZHANG, Yu ZENG. High-resolution real-time semantic segmentation algorithm for edge deployment[J]. Journal of Computer Applications, 0, (): 159-163.

图/表 8

参考文献 22

1	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440.
2	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation［C］// Proceeding of the 2015 Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
3	IANDOLA F N， HAN S， MOSKEWICZ M W， et al. SqueezeNet： AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size［EB/OL］. ［2024-03-28］..
4	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324.
5	HAN K， WANG Y， TIAN Q， et al. GhostNet： more features from cheap operations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 1577-1586.
6	MA N， ZHANG X， ZHENG H-T， et al. ShuffleNet V2： practical guidelines for efficient CNN architecture design［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11218. Cham： Springer， 2018： 122-138.
7	GSCHWEND D. ZynqNet： an FPGA-accelerated embedded convolutional neural network［EB/OL］. ［2024-03-28］..
8	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［EB/OL］. ［2024-03-28］..
9	JACOB B， KLIGYS S， CHEN B， et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2704-2713.
10	HUBARA I， NAHSHAN Y， HANANI Y， et al. Accurate post training quantization with small calibration sets［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 4466-4475.
11	吴艳霞，梁楷，刘颖，等. 深度学习FPGA加速器的进展与趋势［J］. 计算机学报， 2019， 42（11）： 2461-2480.
12	CHEN Y H， KRISHNA T， EMER J S， et al. Eyeriss： an energy-efficient reconfigurable accelerator for deep convolutional neural networks［J］. IEEE Journal of Solid-State Circuits， 2017， 52（1）： 127-138.
13	王晓峰，蒋彭龙，周辉，等. 面向卷积神经网络的高并行度FPGA加速器设计［J］. 计算机应用， 2021， 41（3）： 812-819.
14	DU Z， FASTHUBER R， CHEN T， et al. ShiDianNao： shifting vision processing closer to the sensor［C］// Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York： ACM， 2015： 92-104.
15	HAN S， LIU X， MAO H， et al. EIE： efficient inference engine on compressed deep neural network［J］. ACM SIGARCH Computer Architecture News， 2016， 44（3）： 243-254.
16	YAN R， YI J， HE J， et al. FPGA-based convolutional neural network design and implementation［C］// Proceedings of the 3rd Asia-Pacific Conference on Communications Technology and Computer Science. Piscataway： IEEE， 2023： 456-460.
17	ZHANG C， PRASANNA V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system［C］// Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York： ACM， 2017： 35-44.
18	GUO K， SUI L， QIU J， et al. Angel-Eye： a complete design flow for mapping CNN onto embedded FPGA［J］. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems， 2018， 37（1）： 35-47.
19	YU C， GAO C， WANG J， et al. BiseNet V2： bilateral network with guided aggregation for real-time semantic segmentation［J］. International Journal of Computer Vision， 2021， 129（11）： 3051-3068.
20	FAN M， LAI S， HUANG J， et al. Rethinking BiSeNet for real-time semantic segmentation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9711-9720.
21	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
22	XU J， XIONG Z， BHATTACHARYYA S P. PIDNet： a real-time semantic segmentation network inspired by PID controllers［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 19529-19539.

编码器	输出分辨率（C×H×W）	步长
S1	32×512×1 024	2
S2	64×256×512	2
S3	128×128×256	2
S4	256×64×128	2
S5	256×32×64	2
S6	256×16×32	2

编码器	输出分辨率（C×H×W）	步长
S1	32×512×1 024	2
S2	64×256×512	2
S3	128×128×256	2
S4	256×64×128	2
S5	256×32×64	2
S6	256×16×32	2

资源	使用量	总量	利用率/%
LUT	51 351	119 808	42.9
寄存器	98 818	239 616	41.2
块RAM	255	284	89.8
DSP	710	1 248	56.9

资源	使用量	总量	利用率/%
LUT	51 351	119 808	42.9
寄存器	98 818	239 616	41.2
块RAM	255	284	89.8
DSP	710	1 248	56.9

模型	输入分辨率	mIoU	PA	推理延迟/ms	帧率/（frame·s^-1）		运算量/GFLOPs	DPU利用率/%
模型	输入分辨率	mIoU	PA	推理延迟/ms	单线程	多线程	运算量/GFLOPs	DPU利用率/%
ENet	512×1 024	0.583	0.632	33.7	11.2	29.7	8.6	20.8
FPN-Mobile	512×1 024	—	0.682	30.4	11.2	32.9	5.4	14.5
ERFNet	512×1 024	0.680	0.517	65.1	8.1	15.4	54.0	67.5
TriSeNet	512×1 024	0.750	0.836	31.6	11.3	31.7	24.3	62.6
HRNet	1 024×2 048	0.802	0.806	—	—	—	378.0	—
MobileNet v2 （ADAS 2D）	1 024×2 048	—	0.458	302.1	1.9	3.3	132.7	35.8
PIDNet-S	1 024×2 048	0.786	—	1 608.0	0.6	—	94.3	4.8
TriSeNet	1 024×2 048	0.750	0.836	140.4	2.7	7.0	97.1	56.3

面向边缘部署的高分辨率实时语义分割算法

High-resolution real-time semantic segmentation algorithm for edge deployment

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 22

相关文章 15

编辑推荐

Metrics

模型	辅助分支	融合方式	深监督	mIoU	macro_F1
TriSeNet（主分支）	无	无	无	0.682	0.798
TriSeNet+BGA	有	BGA	无	0.721	0.828
TriSeNet+AMUX	有	AMUX	无	0.734	0.837
TriSeNet+BGA+DeepSup	有	BGA	有	0.735	0.838
TriSeNet+AMUX+DeepSup	有	AMUX	有	0.771	0.864

[1]	龙雨菲, 牟宇辰, 刘晔. 基于张量化图卷积网络和对比学习的多源数据表示学习模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1372-1378.
[2]	张一鸣, 曹腾飞. 基于本地漂移和多样性算力的联邦学习优化算法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1447-1454.
[3]	王丹, 张文豪, 彭丽娟. 基于深度学习的智能反射面辅助通信系统信道估计[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1613-1618.
[4]	令狐鑫瑶, 陈燕, 张鹏程, 刘祎, 桂志国, 赵伟, 董展豪. 基于多尺度引导滤波的宫颈细胞核图像分割[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1333-1339.
[5]	姜坤元, 李小霞, 王利, 曹耀丹, 张晓强, 丁楠, 周颖玥. 引入解耦残差自注意力的边界交叉监督语义分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1120-1129.
[6]	蒋占军, 李洋, 廉敬, 苗新法. 坐标增强与多源采样的脑肿瘤图像分割[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 996-1002.
[7]	王泉, 曹心雨, 陈祺东. 面向车路协同的路侧交通目标检测模型及部署[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 1016-1024.
[8]	袁宝华, 陈佳璐, 王欢. 融合多尺度语义和双分支并行的医学图像分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 988-995.
[9]	耿海军, 董赟, 胡治国, 池浩田, 杨静, 尹霞. 基于Attention-1DCNN-CE的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 872-882.
[10]	王地欣, 王佳昊, 李敏, 陈浩, 胡光耀, 龚宇. 面向水声通信网络的异常攻击检测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 526-533.
[11]	张翰林, 王俊陆, 宋宝燕. 融合衍生特征的时间序列事件分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 428-435.
[12]	徐欣然, 张绍兵, 成苗, 张洋, 曾尚. 基于多路层次化混合专家模型的轴承故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 59-68.
[13]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[14]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[15]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.