A potent Minecraft agent using GPT4 and the first lifelong learning agent to play Minecraft only in context is Voyager.


The great problem facing artificial intelligence researchers today is creating fully autonomous embodied entities that can plan, explore, and learn in open-ended environments. Traditional methods rely on fundamental actions to train models through reinforcement learning (RL) and imitation learning, making methodical investigation, interpretability, and generalizability difficult. Recent advances in large language model (LLM) based agents use the world information encoded in pre-trained LLMs to develop consistent action plans or executable policies. They are utilized in non-embodied NLP activities in addition to embodied ones like gaming and robotics.

In Minecraft, Voyager is the first embodied lifelong learning agent driven by LLM, and it is constantly discovering new worlds, learning new abilities, and making discoveries on its own without the aid of people. Voyager’s three primary parts are as follows:

  1. An automated curriculum and a learning environment that places an emphasis on discoveryn
  2. An ever-expanding repository/skill library of executable code that can store and recall complex activities.
  3. A prompting mechanism for program enhancement that iteratively includes feedback from the surrounding environment, execution faults, and self-verification.

Voyager uses black box queries to communicate with GPT-4, eliminating the need for fine-tuning model parameters. Voyager’s acquired talents quickly compound and mitigate catastrophic forgetting since they are time-extended, interpretable, and compositional. Empirically, Voyager demonstrates extraordinary performance in the video game Minecraft and a robust contextual lifetime learning potential. It can find 3.3 times as many rare goods, travel 2.3 times as far, and reach crucial milestones in the tech tree up to 15.3 times quicker than previous SOTA. While other methods fail to generalize, Voyager can apply the learned skill library in a new Minecraft environment to perform brand-new challenges from scratch.

Voyager’s talents grow fast thanks to the compositional synthesis of complex skills, which prevents the catastrophic forgetting that plagues other forms of continuous learning. Voyager’s exploration progress and the agent’s current state are factored into the automatic curriculum, which proposes increasingly more difficult tasks for Voyager to solve. With “discovering as many different things as possible” as its overriding purpose, GPT-4 creates the course outline. This strategy might be interpreted as a novelty search that operates inside a certain context. Voyager’s skill library is built over time from the active programs that contribute to a successful task resolution. The embedded description of each program serves as an index that can be retrieved in future analogous instances.

  • But LLMs need help developing the right action code on the spot and often get it wrong. The research community has proposed an iterative prompting system to solve this problem.
  • Runs the created code to collect data from the Minecraft simulation and a stack trace of compilation errors.
  • GPT-4 now incorporates the comments into its request for improved programming.
  • Iterates until a built-in checker certifies that the task has been finished when the code is added to the skill library.

Code and installation steps can be found on GitHub here https://github.com/MineDojo/Voyager 

Limitations and Future Work

  • Restriction and the Price of Future Labor. There are major expenses related to the GPT-4 API. It costs 15 cents more than GPT-3.5. However, GPT-4’s quantum improvement in code generation quality is what Voyager needs, and GPT-3.5 and open-source LLMs can’t give it.
  • Inaccuracies. Sometimes, despite the agent’s iterative nudging, the agent still gets stuck and needs help to develop the right talent. It’s possible for the self-verification module to malfunction, for example, by failing to interpret a spider string as evidence of a successful spider-killing attempt. The automatic curriculum can try again at a later time if it fails.
  • Hallucinations. There are times when the automatic curriculum suggests goals that are impossible to reach. Even though cobblestone can’t be used as fuel in the game, GPT-4 frequently does so. For instance, it may instruct the agent to create a “copper sword” or a “copper chest plate,” both of which do not exist in the game. Code creation also induces hallucinations. It may also cause execution issues by attempting to use a function not supported by the APIs for the specified control primitives.

Researchers are optimistic that future updates to the GPT API models and cutting-edge methods for fine-tuning open-source LLMs will eliminate these drawbacks. Voyager might be used as a starting point to create effective generalist agents without fine-tuning the model parameters. Voyager’s capacity for lifelong learning is impressive in this situation. The system can build an ever-expanding library of reusable, interpretable, and generalizable action programs for performing individual tasks. Voyager excels in finding new resources, progressing through the Minecraft tech tree, exploring new environments, and applying its acquired knowledge to novel situations in a freshly generated world.