A New Microsoft AI Research Shows How ChatGPT Can Convert Natural Language Instructions Into Executable Robot Actions


Large language models (LLMs) that can comprehend and produce language similar to that of humans have been made possible by recent developments in natural language processing. Certain LLMs can be honed for specific jobs in a few-shot way through discussions as a consequence of learning a great quantity of data. A good example of such an LLM is ChatGPT. Robotics is one fascinating area where ChatGPT may be employed, where it can be used to translate natural language commands into executable codes for commanding robots. Robot program generation from natural language commands is a desirable aim, and there are several extant studies, some of which are based on LLMs.

Unfortunately, the majority of them lack the human-in-the-loop capability, were built in a constrained scope, or are hardware-dependent. However, the majority of this research relies on particular datasets, making it necessary to recall data and retrain models in order to adapt or expand them to various robotic situations. A robotic system that is easily adaptable to multiple applications or operating circumstances without needing a significant amount of data gathering or model retraining would be excellent from the perspective of practical use. The benefit of adopting ChatGPT for robotic applications is that they may start with a modest amount of sample data to adjust the model for particular applications and make use of its language recognition and interaction capabilities as an interface.

Figure 1: Demonstrates real-world cues that ChatGPT can use to translate multi-step human instructions into actionable robot sequences that may be carried out in diverse settings.

Although ChatGPT’s potential for robotic applications is getting attention, there is currently no proven approach for use in practice. In this study, researchers from Microsoft give a concrete illustration of how ChatGPT may be applied in a few-shot situation to translate natural language commands into a series of actions that a robot can carry out (Fig.1). The prompts were created with the goal of meeting the specifications typical of many real-world applications while also being set up to be easily adaptable.

To meet these requirements, they designed input prompts to encourage ChatGPT to 1) Output a sequence of predefined robot actions with explanations in a readable JSON format. 2) Represent the operating environment in a formalized style. 3) Infer and output the updated state of the operating environment, which can be reused as the next input, allowing ChatGPT to operate based solely on the memory of the latest operations. They conducted experiments to test the effectiveness of their proposed prompts in inferring appropriate actions for multi-stage language instructions in various environments. They listed the following requirements for this paper: 1) Simple interaction with robot execution systems or visual recognition software. 2) Suitability for diverse domestic settings. 3) The capacity to deliver any number of plain-English instructions while reducing the effect of ChatGPT’s token restriction.

They also noted that ChatGPT’s conversational capabilities enable users to modify its output using natural language feedback, which is critical for creating an application that is both secure and resilient while offering a user-friendly interface. The collection of robot actions, environment representation, and object names are all easily modifiable and may be used as templates in the suggested prompts. This paper’s contribution is to create and disseminate generic prompts that are easily adaptable to each experimenter’s needs, giving the robotics research community useful information. They are open-source and freely accessible on GitHub, along with their usage prompts.


Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

???? Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.