Robot Dog Moonwalking in the Style of Michael Jackson:???Use Rewards Between LLMs and an Optimization-Based Motion Controller


The Artificial Intelligence industry has taken over the world in recent times. With the release of new and unique research and models almost every day, AI is evolving and getting better. Whether we consider the healthcare domain, education, marketing, or the business domain, Artificial Intelligence, and Machine Learning practices are beginning to transform how industries operate. The introduction of Large Language Models (LLMs), a well-known advancement in AI, is getting adopted by almost every organization. Famous LLMs like GPT-3.5 and GPT-4 have demonstrated impressive adaptability to new contexts, enabling tasks like logical reasoning and code generation with a minimum number of hand-crafted samples.

Researchers have also looked into using LLMs to improve robotic control in the area of robotics. Since low-level robot operations are hardware-dependent and frequently underrepresented in LLM training data, applying LLMs to robotics is difficult. Previous approaches have either viewed LLMs as semantic planners or have depended on control primitives created by humans to communicate with robots. To address all the challenges, Google DeepMind researchers have introduced a new paradigm that makes use of reward functions’ adaptability and optimization potential to carry out a variety of robotic activities.

Reward functions act as the LLMs’ defined intermediary interfaces, which can be later optimized to direct robot control strategies. These functions are suitable for specification by LLMs due to their semantic richness since they can efficiently connect high-level language commands or corrections with low-level robot behaviors. The team has mentioned that operating at a higher level of abstraction using reward functions as an interface between language and low-level robot actions has been inspired by the observation that human language instructions often describe behavioral outcomes rather than specific low-level actions. By connecting instructions to rewards, it becomes easier to bridge the gap between language and robot behaviors, as rewards capture the depth of semantics associated with desired results.

The MuJoCo MPC (Model Predictive Control) real-time optimizer has been used in this paradigm to enable interactive behavior development. The iterative refinement process has been improved by the user’s ability to observe outcomes right away and provide the system input. For the process of evaluation, the team of researchers designed a set of 17 tasks for both a simulated quadruped robot and a dexterous manipulator robot. The method was able to accomplish 90% of the tasks that were designed with dependably good performance. In contrast, a baseline strategy that uses primitive skills as the interface with Code-as-policies only completed 50% of the tasks. Experiments on an actual robot arm were also done in order to test the methodology’s efficiency in which the interactive system showed complex manipulation skills, such as non-prehensile pushing. 

In conclusion, this is a promising approach with the help of which LLMs can be utilized to define reward parameters and optimize them for robotic control. The combination of LLM-generated rewards and real-time optimization techniques displays an interactive and feedback-driven behavior creation process, enabling users to achieve complex robotic behaviors more efficiently and effectively.