Go Little NeRF; You Are Free Now: This AI Approach Improves Few-shot Neural Rendering Capability

Generating high-fidelity 3D renders of real-world scenes is becoming more and more feasible thanks to the advancement in neural radiance field (NeRF) applications recently. With NeRF, you can transfer real-world scenes into a virtual world and have 3D renders that can be viewed from different perspectives.

NeRF is a deep learning-based approach that represents the scene as a continuous 5D function. It maps 3D coordinates and viewing directions to radiance values which represent how much light travels along the given direction at a given point. This radiance function is approximated using a multi-layer perceptron (MLP) that is trained on a set of input images and corresponding camera parameters.

By capturing the underlying 3D geometry and lighting of the scene, NeRF can generate novel views of the scene from arbitrary viewpoints. This way, you can have an interactive virtual exploration of the scene. Think of it like the bullet-dodging scene in the first Matrix movie.

As with all emerging technologies, NeRF is not without its flaws. The common problem is that it can overfit training views, which causes it to struggle with novel view synthesis when only a few inputs are available. This is a well-known issue known as the few-shot neural rendering problem.

There have been attempts to tackle the few-shot neural rendering problem. Transfer learning methods and depth-supervised methods have been tried, and they were successful to some extent. However, these approaches require pre-training on large-scale datasets and complex training pipelines, which results in computation overhead.

What if there was a way to tackle this problem more efficiently? What if we could synthesize novel views even with sparse inputs? Time to meet FreeNeRF.

Illustration of occlusion regularization. Source:https://arxiv.org/abs/2303.07418

Frequency regularized NeRF (FreeNeRF) is a novel approach proposed to address the few-shot neural rendering problem. It is pretty simple to add to a plain NeRF model, as it only requires adding a few lines of code. FreeNeRF introduces two regularization terms: frequency regularization and occlusion regularization.

Frequency regularization is used to stabilize the learning process and prevent catastrophic overfitting at the start of training. This is made possible by directly regularizing the visible frequency bands of NeRF inputs. The observation here is that there is a significant drop in NeRF performance as higher-frequency inputs are presented to the model. FreeNeRF uses a visible frequency spectrum-based regularization on the training time step to avoid over-smoothness and gradually provide high-frequency information to NeRF.

Occlusion regularization, on the other hand, is used to penalize the near-camera density fields. These fields cause something called floaters, which are artifacts or errors that occur in the rendered image when objects are not properly aligned with the underlying 3D model. Occlusion regularization targets to eliminate floaters in the NeRF. These artifacts are caused by the least overlapped regions in the training views, which are difficult to estimate due to the limited information available. To tackle this, dense fields near the camera are penalized.

Example novel view synthesis results from sparse inputs. Source: https://jiawei-yang.github.io/FreeNeRF/

FreeNeRF combines these two regularization methods to propose a simple baseline that outperforms previous state-of-the-art methods on multiple datasets. It adds almost no additional computation cost. On top of that, it is dependency-free and overhead-free, making it a practical and efficient solution to the few-shot neural rendering problem.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 19k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Ekrem C?etinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.