《计算机应用》唯一官方网站

• •    下一篇

全局时空特征耦合的多景深三维形貌重建

张江峰1,2,闫涛1,2,3,5,陈斌4,5,钱宇华2,3,宋艳涛1,2,3   

  1. 1.山西大学 计算机与信息技术学院,太原 030006;2. 山西大学 山西大学大数据科学与产业研究院,太原 030006;
    3. 山西大学 山西省机器视觉与数据挖掘工程研究中心,太原 030006; 4.哈尔滨工业大学(深圳)国际人工智能研究院,深圳 518055;
    5.哈尔滨工业大学重庆研究院,重庆 401151
  • 收稿日期:2022-10-24 修回日期:2023-01-12 接受日期:2023-01-16 发布日期:2023-04-12 出版日期:2023-04-12
  • 通讯作者: 闫涛
  • 基金资助:
    显微图像动图三维重建方法研究;基于深度学习的脑磁共振图像分割算法研究

Multi-depth-of-field 3D shape reconstruction with global spatio-temporal feature coupling#br#
#br#

  • Received:2022-10-24 Revised:2023-01-12 Accepted:2023-01-16 Online:2023-04-12 Published:2023-04-12

摘要: 多景深三维形貌重建的关键在于精确捕获图像帧之间聚焦离焦的传递信息与单帧图像的聚焦区域变化。针对现有模型无法有效融合全局时空信息的问题,提出全局时空特征耦合(GSTFC)模型提取多景深图像序列的局部与全局的时空特征信息,并设计深度聚焦体积模块(DFV)保留聚焦和离焦的过渡信息。首先,在收缩路径中穿插3D-ConvNeXt模块和3D下采样卷积层,以捕捉多尺度局部时空特征;同时,在瓶颈模块中添加3D-SwinTransformer模块,以捕捉多景深图像序列局部时序特征的全局关联关系;然后,通过自适应参数层将局部时空特征和全局关联关系融合为全局时空特征,并输入扩张路径引导生成深度聚焦体积;最后,深度聚焦体积通过深度注意力提取序列权重信息,保留聚焦与离焦的过渡信息,得到最终深度图。通过实验验证所提模型的有效性,结果表明,相较于最先进的全聚焦深度网络模型(AiFDepthNet),所提模型在FoD500数据集的均方根误差(RMSE)指标下降了12.5%,并且相较于传统的鲁棒聚焦体积正则化的聚焦形貌恢复模型(RFVR-SFF),所提模型可以保留更多的景深过渡关系。

关键词: 三维形貌重建, 深度学习, 有监督, 时空特征耦合, 深度图

Abstract: The essence of 3D shape reconstruction based on multi-depth-of-field image sequences is how to accurately capture the out-of-focus transfer information between image frames and the change of focus area of a single image frame. In response to the inability of existing models to effectively fuse global spatio-temporal information, Global Spatio-Temporal Feature Coupling (GSTFC) model was proposed to extract local and global spatio-temporal feature information of multi-depth-of-field image sequences, and retained the transition information of focus and defocus through the Depth Focus Volume (DFV)module. First, the 3D-ConvNeXt module and 3D downsampling convolutional layer were interspersed in the shrink path to capture multi-scale local spatio-temporal features. Meanwhile, the 3D-SwinTransformer module was added to the bottleneck module to capture the global correlations of local spatio-temporal features of multi-depth-of-field image sequences. Then, the local spatio-temporal features and global correlations were fused into global spatio-temporal. Finally, the deep focus volume extracted the sequence weight information by depth attention and retained the transition information of focus and defocus to get the final depth map. The effectiveness of the proposed model is experimentally verified, and the results show that the proposed model decreases the Root Mean Square Error (RMSE) index by 12.5% in the FoD500 dataset compared with the state-of-the-art AiFDepthNet(All-in-Focus Depth Net) deep learning model, and retains more depth-of-field transition relationships compared with the traditional RFVR-SFF(Robust Focus Volume Regularization in Shape from Focus) model.

Key words: 3D shape reconstruction, deep learning, supervised, spatio-temporal feature coupling, depth map

中图分类号: