基于显式特征匹配和缩放点积注意力的神经辐射场

doi:10.16182/j.issn1004731x.joss.25-0568

摘要/Abstract

摘要：

针对神经辐射场(neural radiance fields，NeRF)在稀疏视图输入及复杂场景下新视图合成易出现伪影和纹理模糊的问题，提出了一种基于显式特征匹配和缩放点积注意力的神经辐射场方法(NeRF based on explicit feature matching and scaled dot-product attention，EMD-NeRF)。使用多尺度特征提取网络从输入的稀疏视图中提取多尺度特征信息。利用融合点积模块计算视图交互信息，作为共享分支。采用余弦相似度作为匹配线索，进行相似性嵌入体渲染。使用正则化损失函数增强场景颜色密度场的质量，提高所渲染的新视图的真实性。在多个开源数据集上的实验结果均证明了所提方法的有效性。

关键词: 神经渲染, 神经辐射场, 视图合成, 三维重建, 特征匹配

Abstract:

To address the problems that neural radiance fields(NeRF) are prone to artifacts and texture blurring in novel view synthesis under sparse view input and complex scenes, this paper proposed neural radiance fields based on explicit feature matching and scaled dot-product attention(EMD-NeRF). A multi-scale feature extraction network was used to extract multi-scale feature information from the input sparse views. A fusion dot-product module was utilized to calculate view interaction information as a shared branch. Cosine similarity was adopted as a matching clue for similarity embedding volume rendering. A regularization loss function was used to enhance the quality of the scene color density field and improve the realism of the rendered new views. Experimental results on multiple open-source datasets verify the effectiveness of the proposed method.

Key words: neural rendering, neural radiance field(NeRF), view synthesis, 3D reconstruction, feature matching

中图分类号:

TP391.9

曹明伟,王凤娜,王子龙等 . 基于显式特征匹配和缩放点积注意力的神经辐射场[J]. 系统仿真学报, 2026, 38(3): 572-583.

Cao Mingwei,Wang Fengna,Wang Zilong,et al . Neural Radiance Fields Based on Explicit Feature Matching and Scaled Dot-product Attention[J]. Journal of System Simulation, 2026, 38(3): 572-583.

图/表 14

图1

图2

图3

图4

表1

图5

图6

表2

图7

表3

图8

表4

表5

图9

参考文献 47

[1]	王自力, 高鋆添, 杨德真, 等. 智能系统可靠性仿真测试与验证技术: 前沿进展与挑战[J]. 系统仿真学报, 2025, 37(7): 1583-1606.
	Wang Zili, Gao Juntian, Yang Dezhen, et al. Reliability Simulation Testing and Verification Technologies for Intelligent Systems: Frontiers, Progress, and Challenges[J]. Journal of System Simulation, 2025, 37(7): 1583-1606.
[2]	Deng Kangle, Liu A, Zhu Junyan, et al. Depth-supervised NeRF: Fewer Views and Faster Training for Free[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 12872-12881.
[3]	Roessle Barbara, Barron J T, Mildenhall B, et al. Dense Depth Priors for Neural Radiance Fields from Sparse Input Views[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 12882-12891.
[4]	Mescheder Lars, Oechsle Michael, Niemeyer Michael, et al. Occupancy Networks: Learning 3D Reconstruction in Function Space[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 4455-4465.
[5]	Liu Lingjie, Gu Jiatao, Zaw Lin Kyaw, et al. Neural Sparse Voxel Fields[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 15651-15663.
[6]	Kellnhofer P, Jebe L C, Jones A, et al. Neural Lumigraph Rendering[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 4285-4295.
[7]	Park J J, Florence P, Straub J, et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 165-174.
[8]	Michalkiewicz Mateusz, Jhony Kaesemodel Pontes, Jack Dominic, et al. Implicit Surface Representations as Layers in Neural Networks[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 4742-4751.
[9]	Chen Zhiqin, Zhang Hao. Learning Implicit Fields for Generative Shape Modeling[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2019: 5932-5941.
[10]	Mildenhall B, Srinivasan P P, Tancik M, et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[11]	Xu Dejia, Jiang Yifan, Wang Peihao, et al. SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image[C]//Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland, 2022: 736-753.
[12]	Kim Mijeong, Seo Seonguk, Han Bohyung. InfoNeRF: Ray Entropy Minimization for Few-shot Neural Volume Rendering[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 12902-12911.
[13]	Truong P, Rakotosaona M J, Manhardt F, et al. SPARF: Neural Radiance Fields from Sparse and Noisy Poses[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 4190-4200.
[14]	Chen Di, Liu Yu, Huang Lianghua, et al. GeoAug: Data Augmentation for Few-shot NeRF with Geometry Constraints[C]//Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland, 2022: 322-337.
[15]	Niemeyer Michael, Barron J T, Mildenhall B, et al. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 5470-5480.
[16]	Fu Hongyu, Yu Xin, Li Lincheng, et al. CBARF: Cascaded Bundle-adjusting Neural Radiance Fields from Imperfect Camera Poses[J]. IEEE Transactions on Multimedia, 2024, 26: 9304-9315.
[17]	Tang Jiaxiang, Chen Xiaokang, Wang Jingbo, et al. Compressible-composable NeRF Via Rank-residual Decomposition[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 14798-14809.
[18]	Yu A, Li Ruilong, Tancik M, et al. PlenOctrees for Real-time Rendering of Neural Radiance Fields[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 5732-5741.
[19]	Mohammad Mahdi Johari, Lepoittevin Yann, Fleuret François. GeoNeRF: Generalizing NeRF with Geometry Priors[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 18344-18347.
[20]	Chibane Julian, Bansal A, Lazova Verica, et al. Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 7907-7916.
[21]	Wang Qianqian, Wang Zhicheng, Genova K, et al. IBRNet: Learning Multi-view Image-based Rendering[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 4688-4697.
[22]	Chen Anpei, Xu Zexiang, Zhao Fuqiang, et al. MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-view Stereo[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 14104-14113.
[23]	Rematas K, Martin-Brualla R, Ferrari V. ShaRF: Shape-conditioned Radiance Fields from a Single View[EB/OL]. (2021-06-23)[2025-04-12]. .
[24]	Yu A, Ye V, Tancik M, et al. pixelNeRF: Neural Radiance Fields from One or Few Images[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 4576-4585.
[25]	Yu Yingchen, Wu Rongliang, Yifang Men, et al. MorphNeRF: Text-guided 3D-aware Editing via Morphing Generative Neural Radiance Fields[J]. IEEE Transactions on Multimedia, 2024, 26: 8516-8528.
[26]	Long Lee Jie, Li Chen, Hee Lee Gim. DiSR-NeRF: Diffusion-guided View-consistent Super-resolution NeRF[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 20561-20570.
[27]	Wang Chuandong, Cai Meng, Li Jianxun. DKD-NeRF: Depth Knowledge-distillation NeRF for Sparse Input Views[C]//Proceedings of the 2024 4th International Joint Conference on Robotics and Artificial Intelligence. New York: ACM, 2025: 119-123.
[28]	Niemeyer Michael, Mescheder Lars, Oechsle Michael, et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 3501-3512.
[29]	Niemeyer Michael, Mescheder Lars, Oechsle Michael, et al. Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 5378-5388.
[30]	Kerbl Bernhard, Kopanas Georgios, Leimkuehler Thomas, et al. 3D Gaussian Splatting for Real-time Radiance Field Rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139.
[31]	Diels Laurens, Vlaminck Michiel, Philips Wilfried, et al. Fast 3D Gaussian Splatting Rendering via Easily Integrable Improvements[J]. IEEE Signal Processing Letters, 2025, 32: 381-385.
[32]	Guo Shuai, Wang Qiuwen, Gao Yijie, et al. Depth-guided Robust Point Cloud Fusion NeRF for Sparse Input Views[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(9): 8093-8106.
[33]	Jain A, Tancik M, Abbeel P. Putting NeRF on a Diet: Semantically Consistent Few-shot View Synthesis[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 5865-5874.
[34]	Jang W, Agapito L. CodeNeRF: Disentangled Neural Radiance Fields for Object Categories[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 12929-12938.
[35]	Li Jiaxin, Feng Zijian, She Qi, et al. MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 12558-12568.
[36]	Trevithick A, Yang Bo. GRF: Learning a General Radiance Field for 3D Representation and Rendering[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 15162-15172.
[37]	Liu Yuan, Peng Sida, Liu Lingjie, et al. Neural Rays for Occlusion-aware Image-based Rendering[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 7814-7823.
[38]	Cao Mingwei, Wang Fengna, Sun Dengdi, et al. BCS-NeRF: Bundle Cross-sensing Neural Radiance Fields[C]//Proceedings of the 6th ACM International Conference on Multimedia in Asia. New York: ACM, 2024: 37.
[39]	Xu Haofei, Zhang Jing, Cai Jianfei, et al. GMFlow: Learning Optical Flow via Global Matching[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 8111-8120.
[40]	Jensen Rasmus, Dahl Anders, Vogiatzis George, et al. Large Scale Multi-view Stereopsis Evaluation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 406-413.
[41]	Mildenhall B, Srinivasan P P, Ortiz-Cayon R, et al. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines[J]. ACM Transactions on Graphics, 2019, 38(4): 29.
[42]	Wang Zhou, Bovik A C, Sheikh H R, et al. Image Quality Assessment: from Error Visibility to Structural Similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[43]	Zhang R, Isola P, Efros A A, et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586-595.
[44]	Yang Jiawei, Pavone M, Wang Yue. FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 8254-8263.
[45]	Seo Seunghyeon, Han Donghoon, Chang Yeonjin, et al. MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2023: 20659-20668.
[46]	Wang Guangcong, Chen Zhaoxi, Chen Change Loy, et al. SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2023: 9031-9042.
[47]	Zhu Zehao, Fan Zhiwen, Jiang Yifan, et al. FSGS: Real-time Few-shot View Synthesis Using Gaussian Splatting[C]//Computer Vision-ECCV 2024. Cham: Springer Nature Switzerland, 2025: 145-163.

输入	方法	类别	PSNR	SSIM	LPIPS
3视图	SRF^[20]	预训练	15.68	0.698	0.281
	PixelNeRF^[24]		18.95	0.710	0.269
	MVSNeRF^[22]		26.63	0.931	0.168
	GeoNeRF^[19]		24.01	0.928	0.162
	IBRNet^[21]		26.04	0.917	0.190
	EMD-NeRF		27.23	0.936	0.159
	FreeNeRF^[44]		18.02	0.680
	DietNeRF^[33]		11.85	0.633	0.314
3视图	RegNeRF^[15]	正则化	18.89	0.745	0.190
	MixNeRF^[45]		18.95	0.744	0.203
	SparseNeRF^[46]		19.55	0.769	0.201

输入	方法	类别	PSNR	SSIM	LPIPS
3视图	SRF^[20]	预训练	17.07	0.436	0.529
	PixelNeRF^[24]		16.17	0.438	0.512
	MVSNeRF^[22]		21.93	0.795	0.252
	GeoNeRF^[19]		21.10	0.827	0.293
	IBRNet^[21]		21.79	0.786	0.279
	EMD-NeRF		22.27	0.802	0.250
	FreeNeRF^[44]		19.63	0.612	0.308
	DietNeRF^[33]		14.94	0.370	0.496
	RegNeRF^[15]		19.08	0.587	0.336
3视图	MixNeRF^[45]	正则化	19.27	0.629	0.236
	SparseNeRF^[46]		19.86	0.624	0.328
	FSGS^[47]		20.31	0.652	0.288

输入	方法	类别	PSNR	SSIM	LPIPS
3视图	PixelNeRF^[24]	预训练	7.39	0.658	0.411
	MVSNeRF^[22]		23.62	0.897	0.176
	IBRNet^[21]		22.44	0.874	0.195
	EMD-NeRF		23.20	0.897	0.163

模块	PSNR	SSIM	LPIPS
MSF	23.20	0.874	0.262
MSF+self	23.51	0.878	0.254
MSF+dot	26.56	0.929	0.173
MSF+self+dot	26.67	0.930	0.168
MSF+FDCM	26.90	0.932	0.167

MSF	FDCM	L_MSE	PSNR	SSIM	LPIPS
×	×	√	22.50	0.888	0.170
√	×	√	22.71	0.889	0.176
×	√	√	22.97	0.891	0.172
√	√	√	23.20	0.897	0.163