How BRDF Auto-Encoder generates Material-Aware Text-to-3D

The development of 3D assets is essential for many commercial applications, including gaming, cinema, and AR/VR. Several labor-intensive and time-consuming steps are required in the traditional 3D asset development process, all of which depend on specialized knowledge and formal aesthetic training. Recent advances in generation quality and efficiency, as well as their potential to significantly reduce the time and skill requirements of traditional 3D asset creation, have drawn increasing attention to text-to-3D pipelines that automatically generate 3D assets from purely textual descriptions. 

These text-to-3D pipelines can provide engaging geometry and appearance by gradually optimizing the target 3D asset expressed as NeRF or DMTET through the SDS loss. Figure 1 illustrates how difficult it is for them to restore high-fidelity object materials, which severely restricts their use in real-world applications like relighting. Although attempts have been made to model bidirectional reflectance distribution function (BRDF) and Lambertian reflectance in their designs, the neural network in charge of predicting materials lacks the motivation and cues necessary to identify an appropriate material that complies with the natural distribution, particularly in fixed light conditions where their indicated material is frequently entangled with environment lights. 

In this study, researchers from Shanghai AI Laboratory and S – Lab, Nanyang Technological University, use rich material data that is already available to learn a unique text-to-3D pipeline that successfully separates material from ambient lighting. There are large-scale BRDF material datasets such as MERL BRDF, Adobe Substance3D materials, and the actual-world BRDF collections TwoShotBRDF, notwithstanding the inaccessibility of coupled datasets of material and text descriptions. As a result, they suggest Material-Aware Text-to-3D through LAtent BRDF auto EncodeR (MATLABER), which uses a brand-new latent BRDF auto-encoder to create realistic and natural-looking materials that precisely match the text prompts. 

For MATLABER to predict BRDF latent codes rather than BRDF values, the latent BRDF auto-encoder is trained to incorporate real-world BRDF priors of TwoShotBRDF in its smooth latent space. This allows MATLABER to concentrate more on selecting the most appropriate material and worry less about the validity of the projected BRDF. Their method guarantees the realism and coherence of object materials and achieves the optimal decoupling of geometry and appearance thanks to the smooth latent space of the BRDF auto-encoder. Their method can produce 3D assets with high-fidelity content, exceeding earlier state-of-the-art text-to-3D pipelines, as illustrated in Figure 1.

Figure 1: The goal of text-to-3D generation is to create high-quality 3D objects that correspond to provided text descriptions. Despite the striking visuals, representative techniques like DreamFusion and Fantasia3D continue to fall short in recovering high-fidelity object materials. Specifically, Fantasia3D forecasts BRDF materials entangled with ambient lighting while DreamFusion just takes into account diffuse materials. The method, which is based on a latent BRDF auto-encoder, can produce organic materials for 3D objects, enabling realistic renderings under various lighting conditions.

More crucially, an accurate estimate of object materials enables activities like scene modification, material editing, and relighting that were previously difficult to do. Several real-world applications notice that these downstream duties are essential, opening the door for a more practical paradigm of 3D content generation. Additionally, their algorithm can infer tactile and sonic information from the acquired materials, which together make up the trinity of material for virtual things, by using multi-modal datasets like ObjectFolder.