As the Neural Radiance Field (NeRF) emerged recently, innovative view synthesis research has evolved significantly. NeRF’s main concept is to use the differentiable volume rendering approach to improve Multi-layer Perceptron (MLP) networks to encode the scene’s density and radiance fields. After training, NeRF can produce high-quality photographs from creative camera postures. Although NeRF may provide photo-realistic rendering results, training a NeRF might take hours or days owing to deep neural network optimization’s slowness, which restricts the range of applications for which it can be used.
Recent studies show that grid-based techniques like Plenoxels, DVGO, TensoRF, and Instant-NGP allow for quick training of a NeRF in minutes. Yet, when a picture gets larger, the memory use of such grid-based representations increases in cubic order. Voxel pruning, tensor decomposition, and hash indexing are just a few of the ways that have been suggested to decrease memory usage. Nevertheless, these algorithms can only handle constrained scenes when grids are constructed in the original Euclidean space. A space-warping technique that converts an unbounded space to a limited one is a frequently used approach to describe unbounded sceneries.
Typically, there are two different types of warping functions. (1) For forward-facing scenes (Fig. 1 (a)), the Normalized Device Coordinate (NDC) warping is used to map an infinitely-far view frustum to a bounded box by squashing the space along the z-axis. (2) For 360° object-centric unbounded scenes, the inverse-sphere warping can map an infinitely large space to a bounded sphere by the sphere inversion transformation. Nevertheless, these two warping techniques cannot accommodate random camera trajectory patterns and instead assume certain ones. The quality of produced pictures particularly suffers when a trajectory is lengthy and comprises several items of interest, known as free trajectories, as seen in Fig. 1(c).
The uneven spatial representation capacity allocation leads to decreased free trajectories performance. In particular, numerous scenery areas remain vacant and invisible to any input perspectives when the trajectory is lengthy and narrow. Yet, regardless of whether the area is vacant, the grids of the present approaches are consistently tiled over the whole picture. As a result, much representation capability must be recovered to unused space. Even though this squandering can be reduced by employing progressive empty-voxel-pruning, tensor decomposition, or hash indexing, it still results in blurry pictures since GPU memory is constrained.
Figure 1: Top: (a) Camera trajectory pointing forward. (b) a 360-degree object-focused camera trajectory. Free camera trajectory is (c). It is really difficult in (c) since the camera trajectory is lengthy and has several foreground items. Bottom: Images that have been rendered using the most recent rapid NeRF training techniques and F2 -NeRF on a scenario with a free trajectory.
Additionally, only sparse and far input views fill the background spaces, whereas many foreground items in Fig. 1 (c) are observed with dense and close input views in the viewable spaces. In this scenario, dense grids should be assigned to the foreground objects to maintain form details, and coarse grids should be placed in the background area for the best utilization of the spatial representation of the grid. However, existing grid-based systems distribute grids uniformly over the area, which results in inefficient use of the representative capacity. Researchers from University of Hong Kong, S-Lab NTU, Max Plank Institute and Texas AM University suggest F2 -NeRF (Fast-Free-NeRF), the first fast NeRF training approach that allows for free camera trajectories for big, unbounded scenes, to solve the abovementioned issues.
F2 – NeRF, based on the Instant-NGP framework, preserves the quick convergence speed of the hash-grid representation and can be trained well on unbounded scenes with different camera trajectories. Based on this standard, they create perspective warping, a basic space-warping technique that can be applied to any camera trajectory. They outline the criteria for an appropriate warping function for any camera setup in F2 – NeRF.
The fundamental principle of perspective warping is to first describe the position of a 3D point p by concatenating the 2D coordinates of the projections of p in the input pictures. Then, using Principle Component Analysis (PCA), map these 2D coordinates into a compact 3D subspace space. They demonstrate empirically that the proposed perspective warping is a generalization of the current NDC warping and the inverse sphere warping to arbitrary trajectories. The perspective warping can handle random trajectories while could automatically degenerate to these two warping functions in forward-facing scenes or 360° object-centric scenes.
They also provide a space subdivision approach to adaptively employ coarse grids for background regions and narrow grids for foreground regions to achieve perspective warping in a grid-based NeRF framework. They conduct comprehensive tests on the unbounded forward-facing dataset, the unbounded 360 object-centric datasets, and a new unbounded free trajectory dataset. The tests demonstrate that F2 – NeRF renders high-quality pictures on the three datasets with various trajectory patterns using the same perspective warping. Their solution beats standard grid-based NeRF algorithms on the new Free dataset with free camera trajectories, only taking around 12 minutes to train on a 2080Ti GPU.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.