Robots are rapidly entering mainstream culture, yet they are typically limited in their abilities because of their programming. Despite the potential benefits of incorporating recent AI advancements into robot design, progress in developing general-purpose robots remains sluggish due to the time required to acquire real-world training data.
The development of robots with the ability to learn many tasks at once and to integrate the comprehension of language models with the practical abilities of a helper robot is an area that has been the subject of extensive study.
DeepMind’s RoboCat is the first agent that can solve and adapt to various tasks on several types of real robots. Findings show that RoboCat learns significantly more quickly than other cutting-edge models. Because it learns from such a huge and varied dataset, it can pick up a new skill with as few as 100 demonstrations. This capacity is crucial to developing a multipurpose robot and will hasten robotics research by reducing human-supervised training requirements.
Their multimodal model Gato (Spanish for “cat”), is the foundation for RoboCat, as it can process words, visuals, and actions in both virtual and real-world settings. Their work fuses Gato’s structure with a massive training dataset containing the visual and motion data of hundreds of robot arms doing different jobs. After this initial training phase, the team put RoboCat through a “self-improvement” training cycle with a new set of activities. Each new activity was learned in five stages:
- Gathering one hundred to one thousand examples of a new task or robot being shown with a human-controlled robotic arm.
- Fine-tuning RoboCat for new task/arm to produce a spin-off agent with specialized capabilities.
- The child agent performs 10,000 repetitions of practice on the new task/arm, adding to the training data pool.
- Blending the sample data with the user’s creations and the demonstration data into RoboCat’s current data set.
- Retraining RoboCat using the updated dataset.
The latest version of RoboCat is based on a dataset containing millions of trajectories from real and simulated robotic arms, as well as data created by the system itself, thanks to all of this training. Vision-based data depicting the jobs RoboCat will be trained to execute collected using four distinct robot types and many robotic arms.
RoboCat was trained to use several robotic arms in a few hours. It learned to use a more complicated arm with a three-fingered gripper and twice as many controlled inputs, despite having been taught on arms with two-pronged grippers.
RoboCat learned to control this new arm deftly enough to pick up gears 86% of the time after witnessing 1000 human-controlled demonstrations collected in hours. The same degree of demonstration allowed it to learn to do tasks that required both precision and knowledge, such as picking the right fruit out of a bowl and figuring out a shape-matching puzzle.
RoboCat’s training is self-perpetuating; the more it learns, the more it improves its ability to learn. The team shows that after learning from 500 demos of each task, RoboCat’s original version was only 36% more effective at performing activities it had never seen before. However, the most recent RoboCat trained on various activities and had twice as much success.
The team believes that RoboCat will pave the way for a new generation of more helpful, general-purpose robotic agents because it can learn autonomously and rapidly develop skills, especially when applied to multiple robotic equipment.