《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3680-3685.DOI: 10.11772/j.issn.1001-9081.2021010076

• 多媒体计算与计算机仿真 • 上一篇    

基于稀疏卷积的前景实时双目深度估计算法

邱哲瀚, 李扬()   

  1. 广东工业大学 机电工程学院,广州 510006
  • 收稿日期:2021-01-15 修回日期:2021-04-22 接受日期:2021-04-29 发布日期:2021-05-12 出版日期:2021-12-10
  • 通讯作者: 李扬
  • 作者简介:邱哲瀚(1996—),男,广东潮州人,硕士研究生,主要研究方向:立体匹配、目标检测;
  • 基金资助:
    清远市工业高新技术领域技术攻关项目(2020KJJH039)

Real-time binocular foreground depth estimation algorithm based on sparse convolution

Zhehan QIU, Yang LI()   

  1. School of Electromechanical Engineering,Guangdong University of Technology,Guangzhou Guangdong 510006,China
  • Received:2021-01-15 Revised:2021-04-22 Accepted:2021-04-29 Online:2021-05-12 Published:2021-12-10
  • Contact: Yang LI
  • About author:QIU Zhehan, born in 1996, M. S. candidate. His research interests include stereo matching, target detection.
  • Supported by:
    the Key Program of Qingyuan Industrial High-tech Technology(2020KJJH039)

摘要:

为了提高立体匹配算法处理前景视差估计任务的计算效率,针对一般网络采用完全双目图像作为输入,场景内前景空间占比小而导致输入信息冗余度大的缺点,提出了一种基于稀疏卷积的目标实时立体匹配算法。为实现和改善算法对稀疏前景的视差估计,首先利用分割算法同时获得稀疏前景掩膜和场景语义特征;其次通过稀疏卷积提取稀疏前景区域的空间特征后与场景语义特征相融合,并将融合特征输入到解码模块进行视差回归;最后以前景真值图作为损失生成视差图。在ApolloScape数据集上的测试结果表明,所提算法的准确性和实时性均优于先进算法PSMNet和GANet,且算法的单次运行时间低至60.5 ms,对目标遮挡具有一定的鲁棒性,可用于目标实时深度估计。

关键词: 立体匹配, 稀疏卷积, 深度学习, 语义分割, 注意力机制

Abstract:

To improve the computational efficiency of stereo matching on foreground disparity estimation tasks, aiming at the disadvantage that the general networks use the complete binocular image as input and the input information redundancy is large due to the small proportion of the foreground space in the scene, a real-time target stereo matching algorithm based on sparse convolution was proposed. In order to realize and improve the sparse foreground disparity estimation of the algorithm, firstly, the sparse foreground mask and scene semantic features were obtained by the segmentation algorithm at the same time. Secondly, the sparse convolution was used to extract the spatial features of the foreground sparse region, and scene semantic features were fused with them. Then, the fused features were input into the decoding module for disparity regression. Finally, the foreground truth graph was used as the loss to generate the disparity graph. The test results on ApolloScape dataset show that the accuracy and real-time performance of the proposed algorithm are better than those of the state-of-the-art algorithms PSMNet (Pyramid Stereo Matching Network) and GANet (Guided Aggregation Network), and the single run time of the algorithm is as low as 60.5 ms. In addition, the proposed algorithm has certain robustness to the foreground occlusion, and can be used for the real-time depth estimation of targets.

Key words: stereo matching, sparse convolution, deep learning, semantic segmentation, attention mechanism

中图分类号: