《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1570-1578.DOI: 10.11772/j.issn.1001-9081.2023050651

• 多媒体计算与计算机仿真 • 上一篇    

基于时空注意力的空间关联三维形貌重建

盖彦辛1,2, 闫涛1,2,3,4(), 张江峰1,2, 郭小英3, 陈斌4,5   

  1. 1.山西大学 计算机与信息技术学院, 太原 030006
    2.山西大学 大数据科学与产业研究院, 太原 030006
    3.山西大学 自动化与软件学院, 太原 030006
    4.哈尔滨工业大学重庆研究院, 重庆 401151
    5.哈尔滨工业大学(深圳) 国际人工智能研究院, 深圳 518055
  • 收稿日期:2023-05-24 修回日期:2023-07-20 接受日期:2023-07-27 发布日期:2023-08-03 出版日期:2024-05-10
  • 通讯作者: 闫涛
  • 作者简介:盖彦辛(1997—),女,山西临汾人,硕士研究生,主要研究方向:深度学习、三维形貌重建
    张江峰(1998—),男,山西晋城人,硕士研究生,CCF会员,主要研究方向:三维形貌重建
    郭小英(1985—),女,山西原平人,副教授,博士,主要研究方向:计算机视觉
    陈斌(1970—),男,四川广汉人,教授,博士,主要研究方向:机器视觉。
    第一联系人:闫涛(1987—),男,山西定襄人,副教授,博士,CCF会员,主要研究方向:三维形貌重建
  • 基金资助:
    国家自然科学基金资助项目(62006146);山西省基础研究计划自然科学研究面上项目(202203021221029);中央引导地方科技发展资金资助项目(YDZJSX20231C001)

3D shape reconstruction with spatial correlation based on spatio-temporal attention

Yanxin GE1,2, Tao YAN1,2,3,4(), Jiangfeng ZHANG1,2, Xiaoying GUO3, Bin CHEN4,5   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
    3.School of Automation and Software Engineering,Shanxi University,Taiyuan Shanxi 030006,China
    4.Chongqing Research Institute of Harbin Institute of Technology,Chongqing 401151,China
    5.International Research Institute of Artificial Intelligence,Harbin Institute of Technology,Shenzhen,Shenzhen Guangdong 518055,China
  • Received:2023-05-24 Revised:2023-07-20 Accepted:2023-07-27 Online:2023-08-03 Published:2024-05-10
  • Contact: Tao YAN
  • About author:GE Yanxin, born in 1997, M. S. candidate. Her research interests include deep learning, 3D shape reconstruction.
    ZHANG Jiangfeng, born in 1998, M. S. candidate. His research interests include 3D shape reconstruction.
    GUO Xiaoying, born in 1985, Ph. D., associate professor. Her research interests include computer vision.
    CHEN Bin, born in 1970, Ph. D., professor. His research interests include computer vision.
  • Supported by:
    National Natural Science Foundation of China(62006146);Natural Science Foundation of Shanxi Province(202203021221029);Funds for Central-Government-Guided Local Science and Technology Development(YDZJSX20231C001)

摘要:

聚焦形貌恢复通过对场景深度和散焦模糊之间的潜在关系进行建模实现三维形貌重建。但现有的三维形貌重建网络无法有效利用图像序列的时序关联进行表征学习,因此,提出一种基于多景深图像序列空间关联特征的深度网络框架——三维空间相关水平分析模型(3D SCHAM)进行三维形貌重建。该模型不仅可以精确捕获单帧图像中聚焦区域到离焦区域的边缘特征,而且可有效利用不同图像帧之间的空间依赖性特征。首先,通过构建深度、宽度和感受野复合扩展的网络构造三维形貌重建的时域连续模型,进而确定单点深度结果;其次,引入基于空间关联的注意力模块,充分学习帧与帧间的“邻接性”与“距离性”空间依赖关系;另外,利用残差反转瓶颈进行重采样,以保持跨尺度的语义丰富性。在DDFF 12-Scene真实场景数据集上的实验结果显示,相较于DfFintheWild模型,3D SCHAM在深度值准确度度量的3个阈值1.25,1.252,1.253上的精确度分别提升了15.34%、3.62%、0.86%,验证了该模型在真实场景的鲁棒性。

关键词: 三维形貌重建, 时空注意力, 深度学习, 空间依赖关系, 深度图

Abstract:

Focused shape restoration realizes 3D shape reconstruction by modeling the potential relationship between scene depth and defocus blur. However, the existing 3D shape reconstruction network cannot effectively utilize the sequential correlation of image sequences for representation learning. Therefore, a depth network framework based on spatial correlation features of multi-depth image sequences, namely 3D Spatial Correlation Horizon Analysis Model (3D SCHAM), was proposed for 3D shape reconstruction, by which not only the edge features could be accurately captured from the focus region to the defocus region in a single image frame, but also the spatial dependence features between different image frames could be utilized effectively. Firstly, the temporal continuous model for 3D shape reconstruction was constructed by constructing a network with composite extension of depth, width and receptive field to determine the single point depth results. Secondly, an attention module based on spatial correlation was introduced to fully learn the spatial dependence relationships of “adjacency” and “distance” between frames. In addition, residual-reversal bottleneck was used for resampling to maintain semantic richness across scales. Experimental results on DDFF 12-Scene real scene dataset show that compared with DfFintheWild model, the accuracy of 3D SCHAM model at three thresholds 1.25,1.252,1.253 is improved by 15.34%, 3.62% and 0.86% respectively, verifying the robustness of 3D SCHAM in real scenes.

Key words: 3D shape reconstruction, spatio-temporal attention, deep learning, spatial dependence relationship, depth map

中图分类号: