Multi-view Depth Estimation Based on Adaptive Space Feature Enhancement

doi:10.16182/j.issn1004731x.joss.23-0112

Abstract

Abstract:

A multi-view depth estimation algorithm based on adaptive space feature enhancement (ASFE) is presented to improve the multi-view depth estimation accuracy.A multi-scale feature extraction module composed of an improved feature pyramid network (FPN) and ASFE is designed. This module obtains multi-scale feature maps withglobal context-aware information and coordinate information. The residual learning network is used to optimize the depth map to prevent the problem of blurred reconstructed edges in multiple convolution operations. The proposed algorithm constructs a focal loss function through the idea of classification to enhance the prediction ability of the network model. The experimental results show that on the technical university of denmark (DTU) dataset, compared with the cascade MVSNet (CasMVSNet) method, the proposed method reduces overall accuracy error, running time, and video memory resource occupation by 14.08%, 72.15%, and 4.62%, respectively. The Mean of the model on the Tanks and Temples dataset is superior to other algorithms, which proves the effectiveness of the proposed multi-view depth estimation algorithm based on ASFE.

Key words: multi-view depth estimation, adaptive space feature enhancement, residual learning network, convolution operation, focal loss function

CLC Number:

TP391.4

Wei Dong, Liu Huan, Zhang Xiaohan, Li Changkai, Sun Tianyi, Zhang Ziyou. Multi-view Depth Estimation Based on Adaptive Space Feature Enhancement[J]. Journal of System Simulation, 2024, 36(1): 110-119.

Figures/Tables 9

Fig. 1

Table 1

Fig. 2

Fig. 3

Table 2

Table 3

Fig. 4

Table 4

Fig. 5

References 25

1	Galliani Silvano, Lasinger Katrin, Schindler Konrad. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2015: 873-881.
2	Žbontar Jure, LeCun Y. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches[J]. Journal of Machine Learning Research, 2016, 17(1): 2287-2318.
3	Luo Wenjie, Schwing Alexander G, Urtasun Raquel. Efficient Deep Learning for Stereo Matching[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2016: 5695-5703.
4	Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end Learning of Geometry and Context for Deep Stereo Regression[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 66-75.
5	Ji Mengqi, Gall Juergen, Zheng Haitian, et al. SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 2326-2334.
6	Kar A, Häne Christian, Malik J. Learning a Multi-view Stereo Machine[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 364-375.
7	Yao Yao, Luo Zixin, Li Shiwei, et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 785-801.
8	Yao Yao, Luo Zixin, Li Shiwei, et al. Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2019: 5520-5529.
9	Yan Jianfeng, Wei Zizhuang, Yi Hongwei, et al. Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking[C]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 674-689.
10	Gu Xiaodong, Fan Zhiwen, Zhu Siyu, et al. Cascade Cost Volume for High-resolution Multi-view Stereo and Stereo Matching[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2020: 2492-2501.
11	Weilharter Rafael, Fraundorfer Friedrich. HighRes-MVSNet: A Fast Multi-view Stereo Network for Dense 3D Reconstruction from High-resolution Images[J]. IEEE Access, 2021, 9: 11306-11315.
12	叶春凯, 万旺根. 基于特征金字塔网络的多视图深度估计[J]. 电子测量技术, 2020, 43(11): 91-95.
	Ye Chunkai, Wan Wanggen. Feature Pyramid Network for Multi-view Depth Estimation[J]. Electronic Measurement Technology, 2020, 43(11): 91-95.
13	Yu Anzhu, Guo Wenyue, Liu Bing, et al. Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175: 448-460.
14	刘万军, 王俊恺, 曲海成. 多尺度代价体信息共享的多视角立体重建网络[J]. 中国图象图形学报, 2022, 27(11): 3331-3342.
	Liu Wanjun, Wang Junkai, Qu Haicheng. Multi-scale Cost Volumes Information Sharing Based Multi-view Stereo Reconstructed Model[J]. Journal of Image and Graphics, 2022, 27(11): 3331-3342.
15	Lin T Y, Dollár Piotr, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2017: 936-944.
16	Ronneberger Olaf, Fischer Philipp, Brox Thomas. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing, 2015: 234-241.
17	Samuel Rota Bulò, Porzi Lorenzo, Kontschieder Peter. In-place Activated BatchNorm for Memory-optimized Training of DNNs[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 5639-5647.
18	Dai Jifeng, Qi Haozhi, Xiong Yuwen, et al. Deformable Convolutional Networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 764-773.
19	Hou Qibin, Zhou Daquan, Feng Jiashi. Coordinate Attention for Efficient Mobile Network Design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2021: 13708-13717.
20	Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 7132-7141.
21	Xu Ning, Price B, Cohen S, et al. Deep Image Matting[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2017: 311-320.
22	Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 2999-3007.
23	Aanæs H, Jensen R R, Vogiatzis G, et al. Large-scale Data for Multiple-view Stereopsis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168.
24	Knapitsch A, Park J, Zhou Qianyi, et al. Tanks and Temples: Benchmarking Large-scale Scene Reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78.
25	Seitz S M, Curless B, Diebel J, et al. A Comparison and Evaluation of Multi-view Stereo Reconstruction Algorithms[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Piscataway, NJ, USA: IEEE, 2006: 519-528.

模块	卷积层描述	输出大小
Conv0	3×3 8,Inplace-ABN	W×H×8
Conv0	3×3 8, Inplace-ABN	W×H×8
Conv1	5×5 16, s=2, Inplace-ABN	W/2×H/2×16
	3×3 16, Inplace-ABN	W/2×H/2×16
	3×3 16, Inplace-ABN	W/2×H/2×16
Conv2	5×5 32, s=2, Inplace-ABN	W/4×H/4×32
	3×3 32, Inplace-ABN	W/4×H/4×32
	3×3 32, Inplace-ABN	W/4×H/4×32
Out0	Conv2, 1×1 32, Inplace-ABN	W/4×H/4×32
Out1	(2*Out0, Conv1),1×1 16, Inplace-ABN	W/2×H/2×16
Out2	(2*Out1, Conv0),1×1 8,Inplace-ABN	W×H×8

算法模型	CA	DCN	focal loss	Acc/mm	Comp/mm	Overall/mm
CasMVSNet^[10]	×	×	×	0.325	0.385	0.355
模型1	√	×	×	0.343	0.320	0.331
模型2	√	√	×	0.329	0.309	0.319
Ours	√	√	√	0.323	0.287	0.305

算法模型	Acc/mm	Comp/mm	Overall/mm	GPU /MB	Run-time/s
Gipuma^[1]	0.283	0.873	0.578	—	—
Surfacenet^[5]	0.450	1.040	0.745	—	—
MVSNet^[7]	0.396	0.527	0.462	22 511	1.210
R-MVSNet^[8]	0.385	0.459	0.422	6 915	1.28
D2HC-RMVSNet^[9]	0.395	0.378	0.386	13 946	2.6
CasMVSNet^[10]	0.325	0.385	0.355	9 891	0.492
HighRes-MVSNet^[11]	0.354	0.393	0.373	1 119	0.10
EPM-RMVSNet^[12]	0.468	0.521	0.495	—	—
AACVP-MVSNet^[13]	0.357	0.326	0.341	1 048	—
MCV-MVSNet^[14]	0.353	0.357	0.355	21 400	3.1
Ours	0.323	0.287	0.305	9 434	0.137

算法模型	Mean	Family	France	Horse	L.H	M60	Panther	P.G.	Train
MVSNet^[7]	43.48	55.99	28.55	25.07	50.79	53.96	50.86	47.90	34.69
R-MVSNet^[8]	48.40	69.96	46.65	32.59	42.95	51.88	48.80	52.00	42.38
D2HC-RMVSNet^[9]	59.20	74.69	56.04	49.42	60.08	59.81	59.61	60.04	53.92
CasMVSNet^[10]	56.84	76.37	58.45	46.26	55.81	56.11	4.06	58.18	49.51
HighRes-MVSNet^[11]	49.81	66.62	44.17	30.84	55.13	53.20	50.32	55.45	42.73
Ours	61.43	78.74	64.79	53.37	60.31	61.86	58.75	58.42	55.20

[1]	Zhang Fengquan, Cao Duo, Ma Xiaohan, Chen Baijun, Zhang Jiangxiao. Style Transfer Network for Generating Opera Makeup Details [J]. Journal of System Simulation, 2023, 35(9): 2064-2076.
[2]	Chen Yuanyuan, Huai Yongjian, Nie Xiaoying, Lang Ke. 3D Garment Collision Simulation Based on Human Skeletal Features [J]. Journal of System Simulation, 2023, 35(9): 2023-2034.
[3]	Benyue Su, Manzhen Sun, Qing Ma, Min Sheng. Action Recognition Method Based on Projection Subspace Views under Single Viewing Angle [J]. Journal of System Simulation, 2023, 35(5): 1098-1108.
[4]	Yecai Guo, Cheng Liu. Adaptive Correction Tracking Algorithm Based on Detector and Locator Fusion [J]. Journal of System Simulation, 2023, 35(4): 709-720.
[5]	Na Ma, Tingxin Wen, Xu Jia, Xiaohui Li. Cross-domain Text Sentiment Classification Based on Auxiliary Classification Networks [J]. Journal of System Simulation, 2023, 35(4): 721-733.
[6]	Benyue Su, Li Zhang, Qingxuan He, Min Sheng. Short-time Human Activity Recognition Based on Wavelet Features Matching [J]. Journal of System Simulation, 2023, 35(1): 158-168.
[7]	Zhihao Zhu, Yan Wang, Zhicheng Ji. Simulation Research on Appearance Detection of Ampoules Based on Lightweight Network and Model Compression [J]. Journal of System Simulation, 2022, 34(12): 2575-2583.
[8]	Yaru Wang, Kai Yang, Yongjie Zhai, Congbin Guo, Wenqing Zhao, Jie Su. Transmission Line Insulator Recognition Based on Artificial Images Data Expansion [J]. Journal of System Simulation, 2022, 34(11): 2337-2347.
[9]	Wei Zhou, Yuxiang Liu, Guangping Liao, Xin Ma. Siamese Object Tracking Algorithm Combined with the Intersection over Union Loss [J]. Journal of System Simulation, 2022, 34(09): 1956-1967.
[10]	Sixian Zhang, Yi Yang, Meng Zhang, Pengbo Mi. An Efficient Tracker via Multi-feature Adaptive Correlation Filter [J]. Journal of System Simulation, 2022, 34(8): 1864-1873.
[11]	Ziying Zhang, Hua Zhou. Research on Inpainting Algorithm of Digital Murals Based on Enhanced Structural Information [J]. Journal of System Simulation, 2022, 34(7): 1524-1531.
[12]	Jintai Shangguan, Yawen Dang, Wei Lian. A Modified Point Matching Method for Non-Rigid Image Registration [J]. Journal of System Simulation, 2022, 34(7): 1482-1489.
[13]	Peipei Zhou, Xinglin Hou. An Unsupervised Deep Neural Network for Image Fusion [J]. Journal of System Simulation, 2022, 34(6): 1267-1274.
[14]	Jianfang Cao, Yiming Jia, Minmin Yan, Xiaodong Tian. Murals Super-resolution Reconstruction with the Stable Enhanced Generative Adversarial Network [J]. Journal of System Simulation, 2022, 34(5): 1076-1089.
[15]	Kaituan Feng, Jie Yuan. Research on Discrete Workshop Task Assignment Based on Improved Water Filling Algorithm [J]. Journal of System Simulation, 2022, 34(4): 768-776.