Nvidia Researchers Introduce BundleSDF: a Novel AI Method for Neural 6-DoF Object Tracking and 3D Reconstruction from a Monocular RGBD Video


Six-DoF (“degree of freedom”) position tracking and 3D reconstruction of an unknown object from a monocular RGBD video are two basic (and closely related) issues in computer vision. Various applications in fields including augmented reality, robotic manipulation, learning-from-demonstration, and the sim-to-real transfer would be possible by resolving these issues. Earlier solutions frequently treat these two issues individually. For instance, neural scene representations have successfully produced realistic 3D object models.

However, these methods rely on real-world item masks and established camera positions. Complete 3D reconstruction is also prevented when a constantly moving camera captures a static object (e.g.,Figure 1 below: the bottom of the thing is never seen if resting on a table). On the other hand, textured 3D models of the test item are frequently needed in advance for pre-training and online template matching, for instance-level, 6-DoF object position estimation and tracking algorithms. Category-level procedures can generalize to new object instances that fall under the same category. Still, they have trouble with out-of-distribution cases and categories of objects that have yet to be seen.

They suggest combining the solutions to these two issues in this study to get around these restrictions. Their method is conceptually similar to earlier work in object-level SLAM. Their approach requires a 2D object mask in the first frame of the video and works on the assumption that the item is rigid. The thing may move around freely during the video, even while being severely occluded, excluding these two conditions. Still, they loosen up many presumptions, enabling us to deal with occlusion, specularity, a lack of visual texture and geometric cues, and abrupt object motion. A memory pool to enable communication between the two systems, an online pose graph optimization mechanism, and a concurrent Neural Object Field to rebuild the 3D form and appearance are essential components of their approach. In Figure 1, the resilience of their approach is illustrated. 

Figure 1: Shows how their technique reconstructs a 3D model of an unknown item using a monocular RGBD sequence and a 2D object mask (in the first frame only). Their technique generalises effectively, handling flat and untextured surfaces, specular highlights, thin structures, extreme occlusion, and a range of interaction agents (human hand, body, robotic arm), all without any prior knowledge of the object or interaction agent. The technique immediately outputs the visible meshes.

Researchers from NVIDIA proposed a fresh approach to 3-D reconstruction from a monocular RGBD video with 6-DoF object tracking. The object in the first frame must be segmented when using their technique. Their technique can handle difficult situations, including quick motion, partial and complete occlusion, absence of texture, and specular highlights, by utilizing two concurrent threads that conduct online graph pose optimization and Neural Object Field representation, respectively. They have shown cutting-edge outcomes for several datasets compared to conventional techniques. Future research will focus on using shape priors to recreate hidden components.

The following is a summary of their contributions: 

• A brand-new technique for 3D reconstruction and causal 6-DoF posture tracking of an original, unidentified dynamic object. 

• They introduce a hybrid SDF representation to deal with uncertain free space caused by the specific challenges in a dynamic object-centric setting, such as noisy segmentation and external occlusions from the interaction. 

• Experiments on three public benchmarks demonstrate state-of-the-art performance against existing approaches.


Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 17k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.