Piano Mastery Benchmarking for High Dimensional Control with Simulated Robot Hands

The gauging process in the domains of control and reinforcement learning advance is quite challenging. A particularly underserved area has been robust benchmarks that focus on high-dimensional control, including, in particular, the perhaps ultimate “challenge problem” of high-dimensional robotics: mastering bi-manual (two-handed) multi-fingered control. At the same time, some benchmarking efforts in control and reinforcement learning have begun to aggregate and explore different aspects of depth. Despite decades of research into imitating the human hand’s dexterity, high-dimensional control in robots continues to be a major difficulty.

A group of researchers from UC Berkeley, Google, DeepMind, Stanford University, and Simon Fraser University presents a new benchmark suite for high-dimensional control called ROBOPIANIST. In their work, bi-manual simulated anthropomorphic robot hands are tasked with playing various songs conditioned on sheet music in a Musical Instrument Digital Interface (MIDI) transcription. The robot hands have 44 actuators altogether and 22 actuators per hand, similar to how human hands are slightly underactuated. 

Playing a song well requires being able to sequence actions in ways that exhibit many of the qualities of high-dimensional control policies. This includes:

  1. Spatial and temporal precision. 
  2. Coordination of 2 hands and ten fingers 
  3. Strategic planning of key pushes to make other key presses easier

150 songs comprise the original ROBOPIANIST-repertoire-150 benchmark, each serving as a standalone virtual work. The researchers study the performance envelope of model-free and model-based methods through comprehensive experiments like model-free (RL) and model-based (MPC) methods. The results suggest that despite having much space for improvement, the proposed policies can produce strong performances. 

The ability of a policy to learn a song can be used to sort songs (i.e., tasks) by difficulty. The researchers believe that the ability to group tasks according to such criteria can encourage further study in a range of areas related to robot learning, such as curriculum and transfer learning. RoboPianist offers fascinating chances for various study approaches, such as imitation learning, multi-task learning, zero-shot generalization, and multimodal (sound, vision, and touch) learning. Overall, ROBOPIANIST shows a simple goal, an environment that is simple to replicate, clear evaluation criteria, and is open to various extension potentials in the future.