Synthesizes view-consistent whole head images using just single-view images with PanoHead.


In computer vision and graphics, photo-realistic portrait image synthesis has been constantly emphasized, with a wide range of downstream applications in virtual avatars, telepresence, immersive gaming, and many other areas. Indistinguishable from genuine images, recent developments in Generative Adversarial Networks (GANs) have shown a remarkably high image synthesis quality. Contemporary generative methods, however, don’t model the underlying 3D scenes; instead, they operate on 2D convolutional networks. As a result, it is impossible to properly ensure 3D consistency when synthesizing head pictures in different positions. Traditional methods call for a parametric textured mesh model learned from extensive 3D scan collections to produce 3D heads with various forms and looks. 

The produced pictures, however, need more fine details and have poor expressiveness and perceptual quality. To make more realistic 3D-aware face pictures, conditional generative models have been created with the advent of differentiable rendering and implicit neural representation. These methods, however, frequently depend on either a multi-view image or 3D scan supervision, which is challenging to get and has a constrained appearance distribution because it is normally recorded in controlled environments. Recent developments in implicit neural representation in 3D scene modeling and generative adversarial networks (GANs) for picture synthesis have accelerated the development of 3D-aware generative models. 

Figure 1 shows how our PanoHead enables high-fidelity geometry and 360 view-consistent photo-realistic full-head image synthesis to create realistic 3D portraits from a single perspective.

One of these, the pioneering 3D GAN, EG3D, has impressive quality in view-consistent picture synthesis and was trained using single-view image sets found in the wild. These 3D GAN methods can only synthesize in near-frontal perspectives, though. Researchers from ByteDance and the University of Wisconsin-Madison suggest PanoHead, a unique 3D-aware GAN trained using solely in-the-wild unstructured photos, enabling high-quality complete 3D head synthesis in 360. Numerous immersive interaction situations, including telepresence and digital avatars, benefit from their model’s ability to synthesize consistent 3D heads that can be seen from all perspectives. They believe their methodology is the first 3D GAN approach to realize 3D head synthesis in 360 degrees fully. 

There are several major technological obstacles to full 3D head synthesis when using 3D GAN frameworks like EG3D: Many 3D GANs can’t distinguish between foreground and background, leading to 2.5D head geometry. Large postures cannot be rendered because the background, normally structured as a wall structure, gets entangled with the created head in 3D. They develop a foreground-aware tri-discriminator that, using previous information from 2D picture segmentation, concurrently learns the decomposition of the foreground head in 3D space. Additionally, hybrid 3D scene representations, such as tri-plane, offer significant projection uncertainty for 360-degree camera postures, resulting in a “mirrored face” on the rear head despite their efficiency and compactness. 

They provide a unique 3D tri-grid volume representation that separates the frontal characteristics from the rear head while preserving the effectiveness of tri-plane representations to address the problem. Finally, getting accurate camera extrinsic of in-the-wild rear head pictures for 3D GANs training is quite challenging. Additionally, there is a discrepancy in picture alignment between these and frontal photos with discernible face landmarks. Unattractive head geometry and a noisy appearance result from the alignment gap. As a result, they suggest a unique two-stage alignment method that reliably aligns photos from all perspectives. This procedure considerably reduces the 3D GANs’ learning curve. 

They specifically suggest a camera self-adaptation module that dynamically modifies rendering camera locations to account for alignment drifts in the rear head pictures. As seen in Figure 1, their approach significantly improves the 3D GANs’ capacity to acclimatize to in-the-wild whole-head photos from arbitrary viewpoints. The resulting 3D GAN creates high-fidelity 360? RGB pictures and geometry and outperforms cutting-edge techniques in quantitative measures. With this model, they demonstrate how to create a 3D portrait with ease by reconstructing a whole head in 3D from a single monocular-view shot. 

The following is a summary of their principal contributions: 

? The first 3D GAN framework capable of rendering 360-degree full-head image synthesis that is view-consistent and high-fidelity. They use high-quality monocular 3D head reconstruction from photos taken in the field to illustrate their methodology. 

? A unique tri-grid formulation for expressing 3D 360-degree head scenarios that compromises effectiveness and expressiveness. 

? A tri-discriminator that separates 2D backdrop synthesis from 3D foreground head modeling. 

? A cutting-edge two-stage picture alignment technique that adaptively accommodates poor camera postures and misaligned image cropping, enabling the training of 3D GANs from photos taken in the wild with a broad range of camera poses.

n