《计算机应用》唯一官方网站 ›› 0, Vol. ›› Issue (): 159-163.DOI: 10.11772/j.issn.1001-9081.2024020218

• 先进计算 • 上一篇    下一篇

面向边缘部署的高分辨率实时语义分割算法

曾林隆1,2, 成苗1,2,3(), 张绍兵1,2,3, 曾渝1,2   

  1. 1.中国科学院 成都计算机应用研究所,成都 610213
    2.中国科学院大学 计算机科学与技术学院,北京 100049
    3.深圳市中钞科信金融科技有限公司,广东 深圳 518206
  • 收稿日期:2024-03-05 修回日期:2024-03-28 接受日期:2024-04-01 发布日期:2025-01-24 出版日期:2024-12-31
  • 通讯作者: 成苗
  • 作者简介:曾林隆(1998—),男,四川隆昌人,硕士研究生,CCF会员,主要研究方向:机器视觉、人工智能
    成苗(1983—),男,四川成都人,高级工程师,硕士,主要研究方向:人工智能、机器视觉
    张绍兵(1979—),男,四川成都人,正高级工程师,硕士,主要研究方向:高速图像处理、缺陷检测、深度学习
    曾渝(1999—),男,重庆人,硕士研究生,主要研究方向:时间序列分析、数据挖掘。

High-resolution real-time semantic segmentation algorithm for edge deployment

Linlong ZENG1,2, Miao CHENG1,2,3(), Shaobing ZHANG1,2,3, Yu ZENG1,2   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610213,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    3.Shenzhen CBPM-KEXIN Banking Technology Company Limited,Shenzhen Guangdong 518206,China
  • Received:2024-03-05 Revised:2024-03-28 Accepted:2024-04-01 Online:2025-01-24 Published:2024-12-31
  • Contact: Miao CHENG

摘要:

在机器视觉领域经典的任务中,语义分割是计算量较大的一类,使得在边缘计算系统中部署执行分割的卷积神经网络(CNN)比较困难。现场可编程逻辑门阵列(FPGA)是工业视觉传感器中广泛使用的数据流处理硬件,而近年来有研究实现了在FPGA上部署CNN。然而,受限于有限的算力,目前的技术在FPGA上实现高分辨率图像的语义分割时,难以达到可接受的速度和精度。通过分析FPGA上深度学习加速器的特性,提出一种新的分割网络——三分支分割网络(TriSeNet),所提网络能端到端地在边缘加速器上推理高分辨率图像的语义分割任务。将TriSeNet部署到赛灵思Kria K26 SOM上推理CityScapes语义分割时取得了75%的平均交并比(mIoU),同时在输入分辨率为512×1?024时,推理速度达到了32 FPS。TriSeNet能高效利用边缘端的计算资源,实现了62.6%的运算器利用率,表明TriSeNet是一种成功适应加速器硬件特点的模型。

关键词: 边缘计算, 图像分割, 卷积神经网络, 智能计算系统, 现场可编程逻辑门阵列

Abstract:

Among the classic tasks in machine vision, semantic segmentation is a category with a large amount of calculation, making it difficult to deploy Convolutional Neural Networks (CNNs) for segmentation in edge computing systems. Field Programmable Gate Array (FPGA) is a hardware widely used in industrial vision sensors for data stream processing. In recent years, methods for deploying CNNs on FPGA have been proposed. However, due to limited computing resources, current technology cannot achieve acceptable speed and accuracy when performing semantic segmentation of high-resolution images on FPGA. After analyzing the characteristics of deep learning accelerators on FPGA, a new segmentation network, Trilateral Segment Network (TriSeNet), was proposed to achieve end-to-end inference of semantic segmentation tasks of high-resolution images on edge accelerators. TriSeNet was deployed on Xilinx Kria K26 SOM to process CityScapes semantic segmentation. TriSeNet achieved a mean Intersection over Union (mIoU) of 75%; for images with resolution of 512*1 024,it had a inference speed of 32 FPS. It could utilize computing resources at the edge efficiently, and achieved a calculator utilization of 62.6%. It is verified that TriSeNet is a model adapting to hardware characteristics of the accelerator successfully.

Key words: edge computing, image segmentation, Convolutional Neural Network (CNN), intelligent computing system, Field Programmable Gate Array (FPGA)

中图分类号: