How Pretrained NeRF and Editable NeRF perform for Text-Driven Localized 3D Object Editing

Industries, including painting, product design, and animation, are being significantly impacted by 3D image synthesis and associated technologies. Although new methods of 3D image synthesis, such as Neural Radiance Field (NeRF), have made it possible to produce 3D content at scale, it is still difficult for these methods to be widely adopted since they make it difficult to precisely and locally modify the forms and colors of objects. Despite several recent attempts at 3D object editing, more localized and granular manipulation of 3D objects frequently needs to be improved and more affordable. This is especially true of adding or deleting specific items of certain styles. While Text2Mesh and TANGO only permit basic texture and shallow shape alterations of whole 3D objects, earlier attempts such as EditNeRF and NeRFEditing only give restricted and non-versatile editing possibilities.

Although CLIP-NeRF proposes a generative technique with disentangled conditional NeRF for object editing, editing just the required portion of objects locally is challenging. It needs a substantial amount of training data for the intended editing category. They also provide a different method to modify item look but not form well: fine-tuning a single NeRF per scene with a CLIP-driven aim. It is required to make stylistic modifications to certain areas of the object, such as selectively altering color and locally adding and deleting densities, as illustrated in Figure 1, to accomplish effective and practical localized editing of 3D objects by any text prompts at scale.

**Figure 1** shows the outcomes of our strategy for text-driven localized object manipulation. The basic object is a bulldozer, and each modification is carried out using procedures to adjust the color, increase density, and subtract density.

In this paper, LG Electronics and Seoul National University authors put forth a cutting-edge technique for localized object editing that enables text prompts to modify 3D objects, providing complete stylization and density-based localized editing. They believe that for completely stylizing shapes and colors, relying on a single NeRF’s uncomplicated fine-tuning to generate new densities near low beginning density or to modify existing densities via a CLIP-driven goal is inadequate. Instead, they employ a method that combines the original 3D object representation with a subset of parameterized implicit 3D volumetric representations and then employ an editable NeRF architecture trained to produce the blended image naturally. They use a pretrained vision-language approach like CLIPSeg to detect the area that has to be altered in the text input procedure. The proposed method is based on a novel, layered NeRF architecture called Blending-NeRF, which comprises a pretrained NeRF and an editable NeRF.

In certain instances, NeRFs are trained simultaneously to recreate an active scene’s static and dynamic elements using multiple NeRFs. However, their method adds an extra NeRF to enable text-based changes in particular areas of a pretrained static scene. These alterations include several editing processes, including color adjustments, density addition, and density reduction. They may fine-grainedly localize and modify 3D objects by combining density and color from the two NeRFs.

They offer the innovative Blending-NeRF architecture, which mixes a pretrained NeRF with an editable NeRF utilizing a variety of aims and training methods.

This is an overview of their contributions.

? With this method, it is possible to alter some 3D objects intuitively while maintaining their original appearance.

? They add new blending techniques that measure the amount of density addition, density reduction, and color modification. Their approach enables the exact targeting of particular regions for localized editing and limits the extent of object alteration because of these blending procedures.

? They do several tests involving text-guided 3D object editings, such as modifying form and color. They compare their method to earlier methods and their straightforward expansions, demonstrating that Blending-NeRF is qualitatively and quantitatively superior.