《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 894-902.DOI: 10.11772/j.issn.1001-9081.2022101589

• 多媒体计算与计算机仿真 • 上一篇    

全局时空特征耦合的多景深三维形貌重建

张江峰1,2, 闫涛1,2,3,4(), 陈斌4,5, 钱宇华2,3, 宋艳涛1,2,3   

  1. 1.山西大学 计算机与信息技术学院, 太原 030006
    2.山西大学 大数据科学与产业研究院, 太原 030006
    3.山西省机器视觉与数据挖掘工程研究中心(山西大学), 太原 030006
    4.哈尔滨工业大学 重庆研究院, 重庆 401151
    5.哈尔滨工业大学(深圳) 国际人工智能研究院, 广东 深圳 518055
  • 收稿日期:2022-10-25 修回日期:2023-01-12 接受日期:2023-01-16 发布日期:2023-03-15 出版日期:2023-03-10
  • 通讯作者: 闫涛
  • 作者简介:张江峰(1998—),男,山西晋城人,硕士研究生,CCF会员,主要研究方向:深度学习、三维重建
    闫涛(1987—),男,山西定襄人,副教授,博士,CCF会员,主要研究方向:三维重建
    陈斌(1970—),男,四川广汉人,教授,博士,主要研究方向:机器视觉
    钱宇华(1976—),男,山西晋城人,教授,博士,CCF会员,主要研究方向:人工智能、机器学习
    宋艳涛(1989—),女,山西临汾人,副教授,博士,主要研究方向:医学图像处理。
  • 基金资助:
    国家自然科学基金资助项目(62006146);山西省基础研究计划资助项目(201901D211170)

Multi-depth-of-field 3D shape reconstruction with global spatio-temporal feature coupling

Jiangfeng ZHANG1,2, Tao YAN1,2,3,4(), Bin CHEN4,5, Yuhua QIAN2,3, Yantao SONG1,2,3   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
    3.Engineering Research Center for Machine Vision and Data Mining of Shanxi Province (Shanxi University),Taiyuan Shanxi 030006,China
    4.Chongqing Research Institute of Harbin Institute of Technology,Chongqing 401151,China
    5.International Research Institute for Artificial Intelligence,Harbin Institute of Technology (Shenzhen),Shenzhen Guangdong 518055,China
  • Received:2022-10-25 Revised:2023-01-12 Accepted:2023-01-16 Online:2023-03-15 Published:2023-03-10
  • Contact: Tao YAN
  • About author:ZHANG Jiangfeng, born in 1998, M. S. candidate. His research interests include deep learning, 3D reconstruction.
    CHEN Bin, born in 1970, Ph. D., professor. His research interests include computer vision.intelligence, machine learning.
    QIAN Yuhua, born in 1976, Ph. D., professor. His researchinterests include artificial intelligence, machine learning.
    SONG Yantao, born in 1989, Ph. D., associate professor. Her research interests include medical image processing.
  • Supported by:
    National Natural Science Foundation of China(62006146);Fundamental Research Program of Shanxi Province(201901D211170)

摘要:

针对现有三维形貌重建模型无法有效融合全局时空信息的问题,设计深度聚焦体积(DFV)模块保留聚焦和离焦的过渡信息,并在此基础上提出全局时空特征耦合(GSTFC)模型提取多景深图像序列的局部与全局的时空特征信息。首先,在收缩路径中穿插3D-ConvNeXt模块和3D卷积层,捕捉多尺度局部时空特征,同时,在瓶颈模块中添加3D-SwinTransformer模块捕捉多景深图像序列局部时序特征的全局关联关系;然后,通过自适应参数层将局部时空特征和全局关联关系融合为全局时空特征,并输入扩张路径引导生成聚焦体积;最后,聚焦体积通过DFV提取序列权重信息,并保留聚焦与离焦的过渡信息,得到最终深度图。实验结果表明,GSTFC在FoD500数据集上的均方根误差(RMSE)相较于最先进的全聚焦深度网络(AiFDepthNet)下降了12.5%,并且比传统的鲁棒聚焦体积正则化的聚焦形貌恢复(RFVR-SFF)模型保留了更多的景深过渡关系。

关键词: 三维形貌重建, 深度学习, 有监督学习, 时空特征耦合, 深度图

Abstract:

In response to the inability of existing 3D shape reconstruction models to effectively fuse global spatio-temporal information, a Depth Focus Volume (DFV) module was proposed to retain the transition information of focus and defocus, on this basis, a Global Spatio-Temporal Feature Coupling (GSTFC) model was proposed to extract local and global spatio-temporal feature information of multi-depth-of-field image sequences. Firstly, the 3D-ConvNeXt module and 3D convolutional layer were interspersed in the shrinkage path to capture multi-scale local spatio-temporal features. Meanwhile, the 3D-SwinTransformer module was added to the bottleneck module to capture the global correlations of local spatio-temporal features of multi-depth-of-field image sequences. Then, the local spatio-temporal features and global correlations were fused into global spatio-temporal features through the adaptive parameter layer, which were input into the expansion path to guide and generate focus volume. Finally, the sequence weight information of the focus volume was extracted by DFV and the transition information of focus and defocus was retained to obtain the final depth map. Experimental results show that GSTFC decreases the Root Mean Square Error (RMSE) index by 12.5% on FoD500 dataset compared with the state-of-the-art All-in-Focus Depth Net (AiFDepthNet) model, and retains more depth-of-field transition relationships compared with the traditional Robust Focus Volume Regularization in Shape from Focus (RFVR-SFF) model.

Key words: 3D shape reconstruction, deep learning, supervised learning, spatio-temporal feature coupling, depth map

中图分类号: