基于跨域自适应的立体匹配算法

doi:10.11772/j.issn.1001-9081.2022091398

摘要/Abstract

摘要：

虽然卷积神经网络（CNN）在有监督立体匹配任务中取得了较好的进展，但多数CNN算法的跨域表现较差。针对跨数据域的立体匹配问题，提出一种基于CNN的使用迁移学习实现域自适应立体匹配任务的跨域自适应立体匹配（CASM-Net）算法。所提算法使用一个可供迁移的特征提取模块提取丰富的广域特征用于跨域立体匹配任务；并且，设计一个自适应代价优化模块，从而通过自适应地利用不同感受野的相似度信息优化代价，进而得到最优的代价分布；此外，提出一个视差分数预测模块，以量化不同区域的立体匹配能力，并通过调整图像的视差搜索范围进一步优化视差结果。实验结果表明：在KITTI2012和KITTI2015数据集上，CASM-Net算法的2-PE-Noc、2-PE-All和3-PE-fg相较于PSMNet（Pyramid Stereo Matching Network）算法分别降低了6.1%、3.3%和19.3%；在Middlebury数据集上，在未经重新训练的情况下，在和其他算法的对比中，CASM-Net算法在所有样本上取得了最优或次优的2-PE结果。可见，CASM-Net算法具有改善跨域立体匹配的作用。

关键词: 有监督立体匹配, 卷积神经网络, 迁移学习, 跨域, 视差分数

Abstract:

Convolutional Neural Networks （CNNs） have made good progress in supervised stereo matching tasks， but most CNN algorithms are difficult to perform well in cross-domain situations. Aiming at the stereo matching problem of cross-domain data， a Cross-domain Adaptation Stereo Matching Network （CASM-Net） algorithm was proposed to achieve domain adaptive stereo matching tasks using transfer learning based on CNN. In the algorithm， a transferable feature extraction module was used to extract rich wide-domain features for stereo matching tasks. At the same time， an adaptive cost optimization module was designed to obtain the optimal cost distribution by making use of the similarity information on different receptive fields to optimize the cost. In addition， a disparity score prediction module was proposed to quantify the stereo matching ability of different regions， and the disparity results were further optimized by adjusting the disparity search range of the image. Experimental results show that on KITTI2012 and KITTI2015 datasets， compared with PSMNet （Pyramid Stereo Matching Network） algorithm， CASM-Net algorithm reduces 6.1%， 3.3% and 19.3% in 2-PE-Noc， 2-PE-All and 3-PE-fg， respectively； on Middlebury dataset， without re-training， CASM-Net algorithm achieves the optimal or suboptimal 2-PE results on all samples in the comparison with other algorithms. It can be seen that CASM-Net algorithm can improve cross-domain stereo matching.

Key words: supervised stereo matching, Convolutional Neural Network (CNN), transfer learning, cross-domain, disparity score

中图分类号:

TP391.41

李传彪, 毕远伟. 基于跨域自适应的立体匹配算法[J]. 计算机应用, 2023, 43(10): 3230-3235.

Chuanbiao LI, Yuanwei BI. Stereo matching algorithm based on cross-domain adaptation[J]. Journal of Computer Applications, 2023, 43(10): 3230-3235.

图/表 11

图1 CASM-Net算法的架构

Fig. 1 Architecture of CASM-Net algorithm

图2 编解码器结构

Fig. 2 Encoder-decoder structure

表1 在多个数据集上不同网络设置的实验结果

Tab. 1 Experimental results of different network settings on multiple datasets

实验	具体设置	3-PE/%			KITTI数据集上的推理时间/s
实验	具体设置	KITTI	Middlebury	ETH3D	KITTI数据集上的推理时间/s
特征提取	原始ResNet算法	4.6	22.93	3.53	0.270
特征提取	迁移ResNet算法	3.9	22.65	3.47	0.270
代价优化	单尺度代价优化	5.3	22.63	3.56	0.221
代价优化	多尺度代价优化	3.5	22.01	3.24	0.230
视差分数预测	未预测视差分数	4.7	23.96	3.56	0.225
视差分数预测	预测视差分数	3.4	22.83	3.15	0.228
损失函数	Smooth L1损失	4.6	23.86	3.93	0.225
损失函数	Smooth L1损失+MAE损失	4.3	23.53	3.75	0.225

图3 使用迁移ResNet算法预训练特征的可视化结果

Fig. 3 Visualization results of pre-trained features using transferred ResNet model

图4 不同尺度下代价预测策略的代价概率分布

Fig. 4 Cost probability distribution of cost prediction schemes at different scales

图5 不同阶段视差图和误差图的对比结果

Fig. 5 Comparison results of disparity maps and error maps at different stages

表2 在KITTI数据集上不同方法的实验结果

Tab. 2 Experimental results of different methods on KITTI datasets

算法	KITTI2012				KITTI2015			时间/s
算法	2-PE-Noc/%	2-PE-All/%	3-PE-Noc/%	3-PE-All/%	3-PE-bg/%	3-PE-fg/%	3-PE-All/%	时间/s
SGM	8.66	10.16	5.76	7.00	5.06	13.00	6.38	0.11
PSMNet	2.44	3.01	1.49	1.89	1.86	4.62	2.32	0.41
SegStereo	2.66	3.19	1.68	2.03	1.88	4.07	2.25	0.60
PBCP	3.62	5.01	2.36	3.45	2.58	8.74	3.61	68.00
CRD-Fusion	6.27	7.53	4.38	5.40	4.59	13.68	6.11	0.02
iResNet	2.69	3.34	1.71	2.16	/	/	/	0.12
CASM-Net	2.29	2.91	1.52	1.97	1.85	3.73	2.16	0.50

图6 在KITTI数据集上不同算法的定性结果

Fig. 6 Qualitative results of different algorithms on KITTI dataset

表3 在Middlebury数据集上不同算法的2-PE结果 (%)

Tab. 3 2-PE results of different algorithms on Middlebury dataset

算法	Adirondack	ArtL	Motorcycle	Piano	Pipes	Recycle	Teddy
SGM	14.90	15.00	14.30	22.70	15.60	16.90	8.00
PSMNet	62.30	53.40	60.40	54.10	52.60	54.50	34.10
iResNet	9.47	12.90	17.90	20.10	19.20	20.30	8.31
CASM- Net	10.70	11.60	14.60	15.80	18.60	12.70	9.53

图7 在Middlebury数据集上不同算法的定性结果

Fig. 7 Qualitative results of different algorithms on Middlebury dataset

图8 CASM-Net算法在ETH3D数据集上的定性结果

Fig. 8 Qualitative results of CASM-Net algorithm on ETH3D dataset

参考文献 26

1	周思达，邱爽，唐嘉宁，等. 基于深度神经网络的无人机路径决策的研究［J］. 计算机仿真， 2022， 39（6）：449-452， 477. 10.3969/j.issn.1006-9348.2022.06.089
	ZHOU S D， QIU S， TANG J N， et al. Research on path decision of UAV based on deep neural network research［J］. Computer Simulation， 2022， 39（6）：449-452， 477. 10.3969/j.issn.1006-9348.2022.06.089
2	陆慧敏，杨朔. 基于深度神经网络的自动驾驶场景三维目标检测算法［J］. 北京工业大学学报， 2022， 48（6）：589-597. 10.11936/bjutxb2021100027
	LU H M， YANG S. Three-dimensional object detection algorithm based on deep neural networks for automatic driving［J］. Journal of Beijing University of Technology， 2022， 48（6）：589-597. 10.11936/bjutxb2021100027
3	吕霁. 基于VR全景图像处理的三维重构算法研究［J］. 安阳师范学院学报， 2022（2）：31-34. 10.3969/j.issn.1671-5330.2022.02.008
	LYU J. Research on 3D reconstruction algorithm based on VR panoramic image processing［J］. Journal of Anyang Normal University， 2022（2）：31-34. 10.3969/j.issn.1671-5330.2022.02.008
4	黄松梅，毕远伟，许晓. 双目立体匹配算法的研究与实现［J］. 鲁东大学学报（自然科学版）， 2018， 34（1）：25-30.
	HUANG S M， BI Y W， XU X. Research and implementation of binocular stereo matching algorithms［J］. Journal of Ludong University （Natural Science Edition）， 2018， 34（1）：25-30.
5	王启胜，王凤随，陈金刚，等. 融合自适应注意力机制的Faster R-CNN目标检测算法［J］. 激光与光电子学进展， 2022， 59（12）： No.1215016.
	WANG Q S， WANG F S， CHEN J G， et al. Faster R-CNN target-detection algorithm fused with adaptive attention mechanism［J］. Laser and Optoelectronics Progress， 2022， 59（12）： No.1215016.
6	张雪晴. 基于CNN的图像分类［J］. 电子技术与软件工程， 2022（7）：182-185.
	ZHANG X Q. CNN-based image classification［J］. Electronic Technology and Software Engineering， 2022（7）：182-185.
7	ŽBONTAR J， LeCUN Y. Computing the stereo matching cost with a convolutional neural network［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：1592-1599. 10.1109/cvpr.2015.7298767
8	张亚茹，孔雅婷，刘彬. 多维注意力特征聚合立体匹配算法［J］. 自动化学报， 2022， 48（7）：1805-1815.
	ZHANG Y R， KONG Y T， LIU B. Multi-dimensional attention feature aggregation stereo matching algorithm［J］. Acta Automatica Sinica， 2022， 48（7）： 1805-1815.
9	KENDALL A， MARTIROSYAN H， DASGUPTA S， et al. End-to-end learning of geometry and context for deep stereo regression［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 66-75. 10.1109/iccv.2017.17
10	RAO Z， HE M， DAI Y， et al. NLCA-Net： a non-local context attention network for stereo matching［J］. APSIPA Transactions on Signal and Information Processing， 2020， 9： No.E18. 10.1017/atsip.2020.16
11	GUO X， YANG K， YANG W， et al. Group-wise correlation stereo network［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3268-3277. 10.1109/cvpr.2019.00339
12	中国矿业大学. 基于深度迁移学习的带式输送机煤流量双目视觉测量方法：202011509023.7［P］. 2021-03-26.
	China University of Mining and Technology. Binocular vision measurement method for coal flow of belt conveyor based on deep transfer learning： 202011509023.7［P］. 2021-03-26.
13	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017， 60（6）： 84-90. 10.1145/3065386
14	张锡英，王厚博，边继龙. 多成本融合的立体匹配网络［J］. 计算机工程， 2022， 48（2）：186-193.
	ZHANG X Y， WANG H B， BIAN J L. Stereo matching network with multi-cost fusion［J］. Computer Engineering， 2022， 48（2）：186-193.
15	邱哲瀚，李扬. 基于稀疏卷积的前景实时双目深度估计算法［J］. 计算机应用， 2021， 41（12）：3680-3685.
	QIU Z H， LI Y. Real-time binocular foreground depth estimation algorithm based on sparse convolution［J］. Journal of Computer Applications， 2021， 41（12）：3680-3685.
16	唐家辉，赵芸，徐兴. 一种改进的多尺度引导聚合立体匹配网络研究［J］. 浙江科技学院学报， 2021， 33（5）：378-385. 10.3969/j.issn.1671-8798.2021.05.005
	TANG J H， ZHAO Y， XU X. Research on an improved multi-scale guided aggregation stereo matching network［J］. Journal of Zhejiang University of Science and Technology， 2021， 33（5）： 378-385. 10.3969/j.issn.1671-8798.2021.05.005
17	MAYER N， ILG E， HÄUSSER P， et al. A large dataset to train convolutional networks for disparity， optical flow， and scene flow estimation［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4040-4048. 10.1109/cvpr.2016.438
18	GEIGER A， LENZ P， URTASUN R. Are we ready for autonomous driving？ the KITTI vision benchmark suite［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 3354-3361. 10.1109/cvpr.2012.6248074
19	SCHARSTEIN D， HIRSCHMÜLLER H， KITAJIMA Y， et al. High-resolution stereo datasets with subpixel-accurate ground truth［C］// Proceedings of the 2014 German Conference on Pattern Recognition， LNCS 8753. Cham： Springer， 2014： 31-42.
20	SCHÖPS T， SCHÖNBERGER J L， GALLIANI S， et al. A multi-view stereo benchmark with high-resolution images and multi-camera videos［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2538-2547. 10.1109/cvpr.2017.272
21	HIRSCHMÜLLER H. Accurate and efficient stereo processing by semi-global matching and mutual information［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition — Volume 2. Piscataway： IEEE， 2005： 807-814. 10.1109/cvpr.2005.4
22	CHANG J R， CHEN Y S. Pyramid stereo matching network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5410-5418. 10.1109/cvpr.2018.00567
23	YANG G， ZHAO H， SHI J， et al. SegStereo： exploiting semantic information for disparity estimation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 660-676.
24	SEKI A， POLLEFEYS M. Patch based confidence prediction for dense disparity map［C］// Proceedings of the 2016 British Machine Vision Conference. Durham： BMVA Press， 2016： No.23. 10.5244/c.30.23
25	FAN X， JEON S， FIDAN B. Occlusion-aware self-supervised stereo matching with confidence guided raw disparity fusion［C］// Proceedings of the 19th Conference on Robots and Vision. Piscataway： IEEE， 2022：132-139. 10.1109/crv55824.2022.00025
26	LIANG Z， FENG Y， GUO Y， et al. Learning for disparity estimation through feature constancy［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2811-2820. 10.1109/cvpr.2018.00297

[1]	尚绍法, 蒋林, 李远成, 朱筠. 异构平台下卷积神经网络推理模型自适应划分和调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2828-2835.
[2]	路琨婷, 费蓉蓉, 张选德. 融合卷积神经网络的遥感图像全色锐化[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2963-2969.
[3]	陈克正, 郭晓然, 钟勇, 李振平. 基于负训练和迁移学习的关系抽取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2426-2430.
[4]	金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.
[5]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[6]	轩勃娜, 李进, 宋亚飞, 马泽煊. 基于改进MobileNetV2的恶意代码分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2217-2225.
[7]	曹春泽, 马德龙, 袁野. 跨域环境下图流三角计数算法GTC[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2040-2048.
[8]	何嘉明, 杨巨成, 吴超, 闫潇宁, 许能华. 基于多模态图卷积神经网络的行人重识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2182-2189.
[9]	秦源源, 张鸿. 基于注意力特征金字塔网络的肺结节检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2311-2318.
[10]	张慧斌, 冯丽萍, 郝耀军, 王一宁. 基于注意力机制和迁移学习的古壁画朝代识别[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1826-1832.
[11]	许睿, 梁爽, 万航, 文益民, 沈世铭, 李建. 基于烛台图模式匹配的PM_2.5扩散特征的提取[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1394-1400.
[12]	隋佳宏, 毛莺池, 于慧敏, 王子成, 平萍. 基于图注意力网络的全局图像描述生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1409-1415.
[13]	何建辉, 胡春龙, 束鑫. 基于多峰标签分布学习的多任务年龄估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1578-1583.
[14]	傅励瑶, 尹梦晓, 杨锋. 基于Transformer的U型医学图像分割网络综述[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1584-1595.
[15]	王彬, 向甜, 吕艺东, 王晓帆. 基于NSGA‑Ⅱ的自适应多尺度特征通道分组优化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1401-1408.