基于跨域自适应的立体匹配算法

doi:10.11772/j.issn.1001-9081.2022091398

摘要/Abstract

摘要：

虽然卷积神经网络（CNN）在有监督立体匹配任务中取得了较好的进展，但多数CNN算法的跨域表现较差。针对跨数据域的立体匹配问题，提出一种基于CNN的使用迁移学习实现域自适应立体匹配任务的跨域自适应立体匹配（CASM-Net）算法。所提算法使用一个可供迁移的特征提取模块提取丰富的广域特征用于跨域立体匹配任务；并且，设计一个自适应代价优化模块，从而通过自适应地利用不同感受野的相似度信息优化代价，进而得到最优的代价分布；此外，提出一个视差分数预测模块，以量化不同区域的立体匹配能力，并通过调整图像的视差搜索范围进一步优化视差结果。实验结果表明：在KITTI2012和KITTI2015数据集上，CASM-Net算法的2-PE-Noc、2-PE-All和3-PE-fg相较于PSMNet（Pyramid Stereo Matching Network）算法分别降低了6.1%、3.3%和19.3%；在Middlebury数据集上，在未经重新训练的情况下，在和其他算法的对比中，CASM-Net算法在所有样本上取得了最优或次优的2-PE结果。可见，CASM-Net算法具有改善跨域立体匹配的作用。

关键词: 有监督立体匹配, 卷积神经网络, 迁移学习, 跨域, 视差分数

Abstract:

Convolutional Neural Networks （CNNs） have made good progress in supervised stereo matching tasks， but most CNN algorithms are difficult to perform well in cross-domain situations. Aiming at the stereo matching problem of cross-domain data， a Cross-domain Adaptation Stereo Matching Network （CASM-Net） algorithm was proposed to achieve domain adaptive stereo matching tasks using transfer learning based on CNN. In the algorithm， a transferable feature extraction module was used to extract rich wide-domain features for stereo matching tasks. At the same time， an adaptive cost optimization module was designed to obtain the optimal cost distribution by making use of the similarity information on different receptive fields to optimize the cost. In addition， a disparity score prediction module was proposed to quantify the stereo matching ability of different regions， and the disparity results were further optimized by adjusting the disparity search range of the image. Experimental results show that on KITTI2012 and KITTI2015 datasets， compared with PSMNet （Pyramid Stereo Matching Network） algorithm， CASM-Net algorithm reduces 6.1%， 3.3% and 19.3% in 2-PE-Noc， 2-PE-All and 3-PE-fg， respectively； on Middlebury dataset， without re-training， CASM-Net algorithm achieves the optimal or suboptimal 2-PE results on all samples in the comparison with other algorithms. It can be seen that CASM-Net algorithm can improve cross-domain stereo matching.

Key words: supervised stereo matching, Convolutional Neural Network (CNN), transfer learning, cross-domain, disparity score

中图分类号:

TP391.41

李传彪, 毕远伟. 基于跨域自适应的立体匹配算法[J]. 计算机应用, 2023, 43(10): 3230-3235.

Chuanbiao LI, Yuanwei BI. Stereo matching algorithm based on cross-domain adaptation[J]. Journal of Computer Applications, 2023, 43(10): 3230-3235.

图/表 11

图1 CASM-Net算法的架构

Fig. 1 Architecture of CASM-Net algorithm

图2 编解码器结构

Fig. 2 Encoder-decoder structure

表1 在多个数据集上不同网络设置的实验结果

Tab. 1 Experimental results of different network settings on multiple datasets

实验	具体设置	3-PE/%			KITTI数据集上的推理时间/s
实验	具体设置	KITTI	Middlebury	ETH3D	KITTI数据集上的推理时间/s
特征提取	原始ResNet算法	4.6	22.93	3.53	0.270
特征提取	迁移ResNet算法	3.9	22.65	3.47	0.270
代价优化	单尺度代价优化	5.3	22.63	3.56	0.221
代价优化	多尺度代价优化	3.5	22.01	3.24	0.230
视差分数预测	未预测视差分数	4.7	23.96	3.56	0.225
视差分数预测	预测视差分数	3.4	22.83	3.15	0.228
损失函数	Smooth L1损失	4.6	23.86	3.93	0.225
损失函数	Smooth L1损失+MAE损失	4.3	23.53	3.75	0.225

图3 使用迁移ResNet算法预训练特征的可视化结果

Fig. 3 Visualization results of pre-trained features using transferred ResNet model

图4 不同尺度下代价预测策略的代价概率分布

Fig. 4 Cost probability distribution of cost prediction schemes at different scales

图5 不同阶段视差图和误差图的对比结果

Fig. 5 Comparison results of disparity maps and error maps at different stages

表2 在KITTI数据集上不同方法的实验结果

Tab. 2 Experimental results of different methods on KITTI datasets

算法	KITTI2012				KITTI2015			时间/s
算法	2-PE-Noc/%	2-PE-All/%	3-PE-Noc/%	3-PE-All/%	3-PE-bg/%	3-PE-fg/%	3-PE-All/%	时间/s
SGM	8.66	10.16	5.76	7.00	5.06	13.00	6.38	0.11
PSMNet	2.44	3.01	1.49	1.89	1.86	4.62	2.32	0.41
SegStereo	2.66	3.19	1.68	2.03	1.88	4.07	2.25	0.60
PBCP	3.62	5.01	2.36	3.45	2.58	8.74	3.61	68.00
CRD-Fusion	6.27	7.53	4.38	5.40	4.59	13.68	6.11	0.02
iResNet	2.69	3.34	1.71	2.16	/	/	/	0.12
CASM-Net	2.29	2.91	1.52	1.97	1.85	3.73	2.16	0.50

图6 在KITTI数据集上不同算法的定性结果

Fig. 6 Qualitative results of different algorithms on KITTI dataset

表3 在Middlebury数据集上不同算法的2-PE结果 (%)

Tab. 3 2-PE results of different algorithms on Middlebury dataset

算法	Adirondack	ArtL	Motorcycle	Piano	Pipes	Recycle	Teddy
SGM	14.90	15.00	14.30	22.70	15.60	16.90	8.00
PSMNet	62.30	53.40	60.40	54.10	52.60	54.50	34.10
iResNet	9.47	12.90	17.90	20.10	19.20	20.30	8.31
CASM- Net	10.70	11.60	14.60	15.80	18.60	12.70	9.53

图7 在Middlebury数据集上不同算法的定性结果

Fig. 7 Qualitative results of different algorithms on Middlebury dataset

图8 CASM-Net算法在ETH3D数据集上的定性结果

Fig. 8 Qualitative results of CASM-Net algorithm on ETH3D dataset

参考文献 26

1	周思达，邱爽，唐嘉宁，等. 基于深度神经网络的无人机路径决策的研究［J］. 计算机仿真， 2022， 39（6）：449-452， 477. 10.3969/j.issn.1006-9348.2022.06.089
	ZHOU S D， QIU S， TANG J N， et al. Research on path decision of UAV based on deep neural network research［J］. Computer Simulation， 2022， 39（6）：449-452， 477. 10.3969/j.issn.1006-9348.2022.06.089
2	陆慧敏，杨朔. 基于深度神经网络的自动驾驶场景三维目标检测算法［J］. 北京工业大学学报， 2022， 48（6）：589-597. 10.11936/bjutxb2021100027
	LU H M， YANG S. Three-dimensional object detection algorithm based on deep neural networks for automatic driving［J］. Journal of Beijing University of Technology， 2022， 48（6）：589-597. 10.11936/bjutxb2021100027
3	吕霁. 基于VR全景图像处理的三维重构算法研究［J］. 安阳师范学院学报， 2022（2）：31-34. 10.3969/j.issn.1671-5330.2022.02.008
	LYU J. Research on 3D reconstruction algorithm based on VR panoramic image processing［J］. Journal of Anyang Normal University， 2022（2）：31-34. 10.3969/j.issn.1671-5330.2022.02.008
4	黄松梅，毕远伟，许晓. 双目立体匹配算法的研究与实现［J］. 鲁东大学学报（自然科学版）， 2018， 34（1）：25-30.
	HUANG S M， BI Y W， XU X. Research and implementation of binocular stereo matching algorithms［J］. Journal of Ludong University （Natural Science Edition）， 2018， 34（1）：25-30.
5	王启胜，王凤随，陈金刚，等. 融合自适应注意力机制的Faster R-CNN目标检测算法［J］. 激光与光电子学进展， 2022， 59（12）： No.1215016.
	WANG Q S， WANG F S， CHEN J G， et al. Faster R-CNN target-detection algorithm fused with adaptive attention mechanism［J］. Laser and Optoelectronics Progress， 2022， 59（12）： No.1215016.
6	张雪晴. 基于CNN的图像分类［J］. 电子技术与软件工程， 2022（7）：182-185.
	ZHANG X Q. CNN-based image classification［J］. Electronic Technology and Software Engineering， 2022（7）：182-185.
7	ŽBONTAR J， LeCUN Y. Computing the stereo matching cost with a convolutional neural network［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015：1592-1599. 10.1109/cvpr.2015.7298767
8	张亚茹，孔雅婷，刘彬. 多维注意力特征聚合立体匹配算法［J］. 自动化学报， 2022， 48（7）：1805-1815.
	ZHANG Y R， KONG Y T， LIU B. Multi-dimensional attention feature aggregation stereo matching algorithm［J］. Acta Automatica Sinica， 2022， 48（7）： 1805-1815.
9	KENDALL A， MARTIROSYAN H， DASGUPTA S， et al. End-to-end learning of geometry and context for deep stereo regression［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 66-75. 10.1109/iccv.2017.17
10	RAO Z， HE M， DAI Y， et al. NLCA-Net： a non-local context attention network for stereo matching［J］. APSIPA Transactions on Signal and Information Processing， 2020， 9： No.E18. 10.1017/atsip.2020.16
11	GUO X， YANG K， YANG W， et al. Group-wise correlation stereo network［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3268-3277. 10.1109/cvpr.2019.00339
12	中国矿业大学. 基于深度迁移学习的带式输送机煤流量双目视觉测量方法：202011509023.7［P］. 2021-03-26.
	China University of Mining and Technology. Binocular vision measurement method for coal flow of belt conveyor based on deep transfer learning： 202011509023.7［P］. 2021-03-26.
13	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017， 60（6）： 84-90. 10.1145/3065386
14	张锡英，王厚博，边继龙. 多成本融合的立体匹配网络［J］. 计算机工程， 2022， 48（2）：186-193.
	ZHANG X Y， WANG H B， BIAN J L. Stereo matching network with multi-cost fusion［J］. Computer Engineering， 2022， 48（2）：186-193.
15	邱哲瀚，李扬. 基于稀疏卷积的前景实时双目深度估计算法［J］. 计算机应用， 2021， 41（12）：3680-3685.
	QIU Z H， LI Y. Real-time binocular foreground depth estimation algorithm based on sparse convolution［J］. Journal of Computer Applications， 2021， 41（12）：3680-3685.
16	唐家辉，赵芸，徐兴. 一种改进的多尺度引导聚合立体匹配网络研究［J］. 浙江科技学院学报， 2021， 33（5）：378-385. 10.3969/j.issn.1671-8798.2021.05.005
	TANG J H， ZHAO Y， XU X. Research on an improved multi-scale guided aggregation stereo matching network［J］. Journal of Zhejiang University of Science and Technology， 2021， 33（5）： 378-385. 10.3969/j.issn.1671-8798.2021.05.005
17	MAYER N， ILG E， HÄUSSER P， et al. A large dataset to train convolutional networks for disparity， optical flow， and scene flow estimation［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4040-4048. 10.1109/cvpr.2016.438
18	GEIGER A， LENZ P， URTASUN R. Are we ready for autonomous driving？ the KITTI vision benchmark suite［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 3354-3361. 10.1109/cvpr.2012.6248074
19	SCHARSTEIN D， HIRSCHMÜLLER H， KITAJIMA Y， et al. High-resolution stereo datasets with subpixel-accurate ground truth［C］// Proceedings of the 2014 German Conference on Pattern Recognition， LNCS 8753. Cham： Springer， 2014： 31-42.
20	SCHÖPS T， SCHÖNBERGER J L， GALLIANI S， et al. A multi-view stereo benchmark with high-resolution images and multi-camera videos［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2538-2547. 10.1109/cvpr.2017.272
21	HIRSCHMÜLLER H. Accurate and efficient stereo processing by semi-global matching and mutual information［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition — Volume 2. Piscataway： IEEE， 2005： 807-814. 10.1109/cvpr.2005.4
22	CHANG J R， CHEN Y S. Pyramid stereo matching network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5410-5418. 10.1109/cvpr.2018.00567
23	YANG G， ZHAO H， SHI J， et al. SegStereo： exploiting semantic information for disparity estimation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 660-676.
24	SEKI A， POLLEFEYS M. Patch based confidence prediction for dense disparity map［C］// Proceedings of the 2016 British Machine Vision Conference. Durham： BMVA Press， 2016： No.23. 10.5244/c.30.23
25	FAN X， JEON S， FIDAN B. Occlusion-aware self-supervised stereo matching with confidence guided raw disparity fusion［C］// Proceedings of the 19th Conference on Robots and Vision. Piscataway： IEEE， 2022：132-139. 10.1109/crv55824.2022.00025
26	LIANG Z， FENG Y， GUO Y， et al. Learning for disparity estimation through feature constancy［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2811-2820. 10.1109/cvpr.2018.00297

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[3]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[4]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[7]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[8]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[9]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[10]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[11]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[12]	翟飞宇, 马汉达. 基于DenseNet的经典-量子混合分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1905-1910.
[13]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[14]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[15]	时旺军, 王晶, 宁晓军, 林友芳. 小样本场景下的元迁移学习睡眠分期模型[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1445-1451.