基于自适应空间特征增强的多视图深度估计

doi:10.16182/j.issn1004731x.joss.23-0112

摘要/Abstract

摘要：

为了提高多视图深度估计结果精度，提出一种基于自适应空间特征增强的多视图深度估计算法。设计了由改进后的特征金字塔网络( feature pyramid network, FPN)和自适应空间特征增强(adaptive space feature enhancement, ASFE)组成的多尺度特征提取模块，获取到具有全局上下文信息和位置信息的多尺度特征图像。通过残差学习网络对深度图进行优化，防止多次卷积操作出现重建边缘模糊的问题。通过分类的思想构建focal loss函数增强网络模型的判断能力。由实验结果可知，该算法在DTU(technical university of denmark)数据集上和CasMVSNet(Cascade MVSNet)算法相比，在整体精度误差、运行时间、显存资源占用上分别降低了14.08%、72.15%、4.62%。在Tanks and Temples数据集整体评价指标Mean上该模型优于其他算法，证明提出的基于自适应空间特征增强的多视图深度估计算法的有效性。

关键词: 多视图深度估计, 自适应空间特征增强, 残差学习网络, 卷积操作, focal loss函数

Abstract:

A multi-view depth estimation algorithm based on adaptive space feature enhancement (ASFE) is presented to improve the multi-view depth estimation accuracy.A multi-scale feature extraction module composed of an improved feature pyramid network (FPN) and ASFE is designed. This module obtains multi-scale feature maps withglobal context-aware information and coordinate information. The residual learning network is used to optimize the depth map to prevent the problem of blurred reconstructed edges in multiple convolution operations. The proposed algorithm constructs a focal loss function through the idea of classification to enhance the prediction ability of the network model. The experimental results show that on the technical university of denmark (DTU) dataset, compared with the cascade MVSNet (CasMVSNet) method, the proposed method reduces overall accuracy error, running time, and video memory resource occupation by 14.08%, 72.15%, and 4.62%, respectively. The Mean of the model on the Tanks and Temples dataset is superior to other algorithms, which proves the effectiveness of the proposed multi-view depth estimation algorithm based on ASFE.

Key words: multi-view depth estimation, adaptive space feature enhancement, residual learning network, convolution operation, focal loss function

中图分类号:

TP391.4

魏东,刘欢,张潇瀚等 . 基于自适应空间特征增强的多视图深度估计[J]. 系统仿真学报, 2024, 36(1): 110-119.

Wei Dong,Liu Huan,Zhang Xiaohan,et al . Multi-view Depth Estimation Based on Adaptive Space Feature Enhancement[J]. Journal of System Simulation, 2024, 36(1): 110-119.

图/表 9

图1

表1

图2

图3

表2

表3

图4

表4

图5

参考文献 25

1	Galliani Silvano, Lasinger Katrin, Schindler Konrad. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2015: 873-881.
2	Žbontar Jure, LeCun Y. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches[J]. Journal of Machine Learning Research, 2016, 17(1): 2287-2318.
3	Luo Wenjie, Schwing Alexander G, Urtasun Raquel. Efficient Deep Learning for Stereo Matching[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2016: 5695-5703.
4	Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end Learning of Geometry and Context for Deep Stereo Regression[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 66-75.
5	Ji Mengqi, Gall Juergen, Zheng Haitian, et al. SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 2326-2334.
6	Kar A, Häne Christian, Malik J. Learning a Multi-view Stereo Machine[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 364-375.
7	Yao Yao, Luo Zixin, Li Shiwei, et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 785-801.
8	Yao Yao, Luo Zixin, Li Shiwei, et al. Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2019: 5520-5529.
9	Yan Jianfeng, Wei Zizhuang, Yi Hongwei, et al. Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking[C]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 674-689.
10	Gu Xiaodong, Fan Zhiwen, Zhu Siyu, et al. Cascade Cost Volume for High-resolution Multi-view Stereo and Stereo Matching[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2020: 2492-2501.
11	Weilharter Rafael, Fraundorfer Friedrich. HighRes-MVSNet: A Fast Multi-view Stereo Network for Dense 3D Reconstruction from High-resolution Images[J]. IEEE Access, 2021, 9: 11306-11315.
12	叶春凯, 万旺根. 基于特征金字塔网络的多视图深度估计[J]. 电子测量技术, 2020, 43(11): 91-95.
	Ye Chunkai, Wan Wanggen. Feature Pyramid Network for Multi-view Depth Estimation[J]. Electronic Measurement Technology, 2020, 43(11): 91-95.
13	Yu Anzhu, Guo Wenyue, Liu Bing, et al. Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175: 448-460.
14	刘万军, 王俊恺, 曲海成. 多尺度代价体信息共享的多视角立体重建网络[J]. 中国图象图形学报, 2022, 27(11): 3331-3342.
	Liu Wanjun, Wang Junkai, Qu Haicheng. Multi-scale Cost Volumes Information Sharing Based Multi-view Stereo Reconstructed Model[J]. Journal of Image and Graphics, 2022, 27(11): 3331-3342.
15	Lin T Y, Dollár Piotr, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2017: 936-944.
16	Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing, 2015: 234-241.
17	Samuel Rota Bulò, Porzi Lorenzo, Kontschieder Peter. In-place Activated BatchNorm for Memory-optimized Training of DNNs[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 5639-5647.
18	Dai Jifeng, Qi Haozhi, Xiong Yuwen, et al. Deformable Convolutional Networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 764-773.
19	Hou Qibin, Zhou Daquan, Feng Jiashi. Coordinate Attention for Efficient Mobile Network Design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2021: 13708-13717.
20	Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 7132-7141.
21	Xu Ning, Price B, Cohen S, et al. Deep Image Matting[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2017: 311-320.
22	Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 2999-3007.
23	Aanæs H, Jensen R R, Vogiatzis G, et al. Large-scale Data for Multiple-view Stereopsis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168.
24	Knapitsch A, Park J, Zhou Qianyi, et al. Tanks and Temples: Benchmarking Large-scale Scene Reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78.
25	Seitz S M, Curless B, Diebel J, et al. A Comparison and Evaluation of Multi-view Stereo Reconstruction Algorithms[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Piscataway, NJ, USA: IEEE, 2006: 519-528.

模块	卷积层描述	输出大小
Conv0	3×3 8,Inplace-ABN	W×H×8
Conv0	3×3 8, Inplace-ABN	W×H×8
Conv1	5×5 16, s=2, Inplace-ABN	W/2×H/2×16
	3×3 16, Inplace-ABN	W/2×H/2×16
	3×3 16, Inplace-ABN	W/2×H/2×16
Conv2	5×5 32, s=2, Inplace-ABN	W/4×H/4×32
	3×3 32, Inplace-ABN	W/4×H/4×32
	3×3 32, Inplace-ABN	W/4×H/4×32
Out0	Conv2, 1×1 32, Inplace-ABN	W/4×H/4×32
Out1	(2*Out0, Conv1),1×1 16, Inplace-ABN	W/2×H/2×16
Out2	(2*Out1, Conv0),1×1 8,Inplace-ABN	W×H×8

算法模型	CA	DCN	focal loss	Acc/mm	Comp/mm	Overall/mm
CasMVSNet^[10]	×	×	×	0.325	0.385	0.355
模型1	√	×	×	0.343	0.320	0.331
模型2	√	√	×	0.329	0.309	0.319
Ours	√	√	√	0.323	0.287	0.305

算法模型	Acc/mm	Comp/mm	Overall/mm	GPU /MB	Run-time/s
Gipuma^[1]	0.283	0.873	0.578	—	—
Surfacenet^[5]	0.450	1.040	0.745	—	—
MVSNet^[7]	0.396	0.527	0.462	22 511	1.210
R-MVSNet^[8]	0.385	0.459	0.422	6 915	1.28
D2HC-RMVSNet^[9]	0.395	0.378	0.386	13 946	2.6
CasMVSNet^[10]	0.325	0.385	0.355	9 891	0.492
HighRes-MVSNet^[11]	0.354	0.393	0.373	1 119	0.10
EPM-RMVSNet^[12]	0.468	0.521	0.495	—	—
AACVP-MVSNet^[13]	0.357	0.326	0.341	1 048	—
MCV-MVSNet^[14]	0.353	0.357	0.355	21 400	3.1
Ours	0.323	0.287	0.305	9 434	0.137

算法模型	Mean	Family	France	Horse	L.H	M60	Panther	P.G.	Train
MVSNet^[7]	43.48	55.99	28.55	25.07	50.79	53.96	50.86	47.90	34.69
R-MVSNet^[8]	48.40	69.96	46.65	32.59	42.95	51.88	48.80	52.00	42.38
D2HC-RMVSNet^[9]	59.20	74.69	56.04	49.42	60.08	59.81	59.61	60.04	53.92
CasMVSNet^[10]	56.84	76.37	58.45	46.26	55.81	56.11	4.06	58.18	49.51
HighRes-MVSNet^[11]	49.81	66.62	44.17	30.84	55.13	53.20	50.32	55.45	42.73
Ours	61.43	78.74	64.79	53.37	60.31	61.86	58.75	58.42	55.20

[1]	张凤全, 曹铎, 马晓寒, 陈柏君, 张江霄. 一种面向戏曲妆容细节生成的风格迁移网络[J]. 系统仿真学报, 2023, 35(9): 2064-2076.
[2]	陈园园, 淮永建, 聂笑盈, 郎柯. 基于人体骨骼特征的三维服装碰撞模拟[J]. 系统仿真学报, 2023, 35(9): 2023-2034.
[3]	苏本跃, 孙满贞, 马庆, 盛敏. 单视角下基于投影子空间视图的动作识别方法[J]. 系统仿真学报, 2023, 35(5): 1098-1108.
[4]	郭业才, 刘程. 基于检测器与定位器融合的自适应校正跟踪算法[J]. 系统仿真学报, 2023, 35(4): 709-720.
[5]	马娜, 温廷新, 贾旭, 李晓会. 基于辅助分类网络的跨领域文本情感分类[J]. 系统仿真学报, 2023, 35(4): 721-733.
[6]	苏本跃, 张利, 何清旋, 盛敏. 基于小波特征匹配的短时人体行为识别[J]. 系统仿真学报, 2023, 35(1): 158-168.
[7]	朱志豪, 王艳, 纪志成. 基于模型压缩的安瓿瓶外观检测仿真研究[J]. 系统仿真学报, 2022, 34(12): 2575-2583.
[8]	王亚茹, 杨凯, 翟永杰, 郭聪彬, 赵文清, 苏杰. 基于人工图像数据扩充的输电线路绝缘子识别[J]. 系统仿真学报, 2022, 34(11): 2337-2347.
[9]	周维, 刘宇翔, 廖广平, 马鑫. 结合交并比损失的孪生网络目标跟踪算法研究[J]. 系统仿真学报, 2022, 34(09): 1956-1967.
[10]	张思贤, 杨艺, 张猛, 米鹏博. 高效的多特征自适应相关滤波跟踪器[J]. 系统仿真学报, 2022, 34(8): 1864-1873.
[11]	张子迎, 周华. 强化结构的数字壁画病害修复算法研究[J]. 系统仿真学报, 2022, 34(7): 1524-1531.
[12]	上官晋太, 党雅文, 连玮. 非刚体图像配准中一种改进的点匹配方法[J]. 系统仿真学报, 2022, 34(7): 1482-1489.
[13]	周培培, 侯幸林. 一种用于图像融合的无监督深度神经网络[J]. 系统仿真学报, 2022, 34(6): 1267-1274.
[14]	曹建芳, 贾一鸣, 闫敏敏, 田晓东. 稳定增强生成对抗网络在壁画的超分辨率重建[J]. 系统仿真学报, 2022, 34(5): 1076-1089.
[15]	冯开团, 袁杰. 基于改进注水算法的离散车间任务分配问题研究[J]. 系统仿真学报, 2022, 34(4): 768-776.