Towards General-Purpose Minecraft AI Agents

Preliminary baseline of neuro-symbolic AI Agent achieving open-ended goals in Minecraft environment

Artificial General Intelligence is a fascinating field, which promises AI systems with breadth of applicability. Both theoretical ideas and their program implementation require practical approbation to prove their viability. Earlier, OpenCog cognitive architecture was used in such applications as humanoid robots, such as Grace the elder care robot from Awakening Health, biomedical informatics and others. A novel version of OpenCog Hyperon is being designed and developed with similar applications in mind.

In this series of blog posts, we involve one more testbed for discussing, evaluating, and experimenting with our AGI concepts and design. Namely, we use Minecraft as an open-ended and complex environment which, we believe, can lead to interesting and important conclusions about promising approaches to AGI.

Measuring Advances in AGI

Evaluating incremental progress in AGI has always been tricky and remains so despite all the achievements in this area. Benchmark tasks which are too narrow force competitors to apply their human intelligence to inventing problem-specific solutions. More complex or general tasks still favor one approach to AGI over another and have a considerable domain-specific component, which incentivizes optimization of a favored AI model, rather than driving innovation of more generally intelligent systems. If the task is too complex, competition revolves around tweaking and tuning existing technologies until the next technology is invented, with a disputable contribution to progress in AGI.

For example, Natural Language Understanding (NLU) models such as ERNIE 3.0 or T5 show amazing results, which are already superhuman on SuperGLUE (General Language Understanding Evaluation). But it is inappropriate to ask if these models can play chess or could at least be integrated with, say, MuZero to discuss possible modification of the game rules. It may turn out that the latter requires completely different architectures.

Evaluating integral behavior of autonomous agents such as robots may sound more appropriate in the context of AGI. And it has its merits, indeed. However, navigation in the physical world is an important, but still domain-specific skill. Working in virtual worlds can reduce the contribution of low-level sensorimotor skills into the overall agent performance and can partly shift focus on higher-level cognitive capabilities.

End-to-end learning to play in 50+ Atari games via deep reinforcement learning (RL) without game-specific priors was very impressive (DeepMind –Agent57, 2020). But what we learned from this is that mastering such a reactive game with fixed goals (reward functions) doesn’t require AGI and is amenable to relatively simple models. Settings of such benchmarks and challenges favor certain classes of models, which manifest reflex-like behavior. In turn, excelling at Starcraft does require significant strategic planning, but AlphaStar achieved this only via a considerable amount of highly Starcraft-specific design.

Although solving any task which looks challenging from the current AI standpoint can teach us something, we wanted to define a challenge that does not favor too much a specific class of methods (e.g. symbolic or neural), is not solvable by domain-specific hacking, but that still allows visible incremental progress. To show general intelligence, the agent must be challenged to solve an arbitrary, distant goal in a rich environment.

Researchers interested in more open-ended, creativity-oriented and higher level reasoning focused approaches toward general intelligence have looked to the Minecraft game environment as an alternative. Minecraft (through the Malmö platform) provides a different sort of challenging environment for AI agents, one which is both high-dimensional and requires active cognitive control (not just reactive responses to local situations, but compositional long-term activities).

Minecraft environment for AI Evaluation, precedents

Using Minecraft as a testbed for AI systems is not new, although attempts to do this are not numerous. Two recent challenges should be mentioned in this regard:

  • MarLÖ 2018 — Multi-Agent Reinforcement Learning in Minecraft Competition (an extension of the Malmo Collaborative AI Challenge);
  • MineRL Competition 2020 and 2021.

Both competitions are biased towards reinforcement learning, but approach it differently. MarLÖ provides simplified Minecraft worlds as environments (like a fenced area of 5x5 blocks) with tasks like chasing a pig, which seems not too difficult for RL. However, the challenge is for multiple agents to collaborate in achieving the common goal by recognizing intents and strategies of each other just from their in-game behavior.

MarLÖ Competition

While the challenge itself is interesting, its restrictions are not very natural. Humans learn to communicate using language in different settings, and direct communications between agents in MarLÖ are not introduced not because it would be non-AGI-ish or less beneficial for collaborative behavior, but because it doesn’t correspond specifically to subsymbolic RL methodology.

MineRL, in turn, doesn’t suppose multi-agent collaborations, but considers large Minecraft environments with a distant goal. The diamond competition consisted in acquiring a diamond, which requires first to cut some wood, then to craft wood planks, wooden pickaxe, mining cobblestone, crafting stone pickaxe, and digging deep, while avoiding lava. This goal is far from most distant in Minecraft, but deep RL agents fail to learn how to achieve it without hints. MineRL underlines that deep RL in such games as Go or StarCraft require years of play time to achieve superhuman results,which will be unachievable at all in the MineRL challenge. To workaround this problem, the competition proposes to train RL agents using records of human player walkthroughs (this is one of the largest imitation learning datasets with over 60 million frames of recorded human player data).

The agent also receives auxiliary rewards for obtaining prerequisite items in addition to a high reward for obtaining a diamond. Moreover, in the navigation subtask, additional rewards are given for approaching the provided location. That is, the agent is consequently trained to navigate, to gather one prerequisite item after another.

Diamond Competition

Although there are deep RL models that achieve the final goal, they are intensively guided to do this. They don’t learn to plan in order to achieve arbitrary distant goals; rather they merely learn to imperatively execute a very concrete sequence of steps. Also, while imitation learning is generally valuable for AGI, it is used here as an alternative to giving agents a simple symbolic description of the diamond mining task primarily because it corresponds so closely to the subsymbolic RL methodology. For a human player in the same situation, of course, a simple linguistic (symbolic) description of the task would normally be given, rather than purely evoking the task by examples.

More recently, the BASALT competition has more interesting and less formalized tasks such as “Find Cave”, “Make Waterfall”, “Create Village Animal Pen”, and “Build Village House”. Unfortunately, these are still very concrete goals. Of course, it is amazing if agents can achieve these goals in randomly generated Minecraft worlds, but they learn to imitate sequences of human actions, without understanding them. This is not too far from reactive behavior in Atari games, yet in a more complex environment. In order to train such RL agents for any other task (be it building a cave instead of finding it, or finding a waterfall instead of making it), a huge amount of training data should be gathered first, and the agent should be retrained — obviously not an elegant or sustainable solution to complex problem solving by AI agents.

We consider Minecraft as a more interesting environment for early stage AGI systems than these prior Minecraft challenges demonstrate. We believe that the most exciting features of the Minecraft world is its openness and creativity. Our agents will explore this world freely and achieve different kinds of goals. We don’t want to restrict the wayMinecraft agents are designed and trained, what prior information they use, etc. However, we feel the Minecraft environment can be leveraged better via AI systems incorporating explicit knowledge representation and reasoning. In this way, instead of adjusting the Minecraft tasks pursued to match the limitations of current RL systems, we intend to solve more complex tasks in a Minecraft context using a neural-symbolic approach.

We are intentionally not introducing formal metrics for evaluating agents, because it is difficult in settings of our interest and will also force metric-specific optimizations, which are not interesting in AGI context. Thus, Minecraft serves as an experimentation platform (actually, one of several of them) rather than a challenge or competition.

First step towards general-purpose Minecraft agents

We are developing a Minecraft agent with open-ended behavior capable of achieving arbitrary goals — discovering new blocks, mobs, biomes and other places, crafting new items, building novel structures, collaborating with other agents. Of course, the agent capabilities are being implemented gradually, and it was natural to start with finding blocks and obtaining items like in the diamond challenge. In contrast to this challenge, however, we didn’t restrict our agent with one goal, and thus neither vanilla reinforcement learning nor imitation learning were applicable. The two main classes of goals our agent can deal with are: searching for new blocks and items, and acquiring specified items. However, there is no strict boundary between them, since such goals as “find any novel block and mine it” are also possible, and, in turn, obtaining a certain item may require free exploration.

This video demonstrates these capabilities of our preliminary agent (running in the standard Malmo AI-experimentation platform on Minecraft, which simplifies certain aspects like crafting and environment sensing):

As can be seen, it has some basic exploration capabilities, so it can find new types of blocks and get new items. However, even if we focus on such open-ended intrinsically motivated behavior, we still need our agent to be able to achieve specific goals. Imagine that the agent has discovered an iron ore block and wants to mine it, but doesn’t have a stone pickaxe at the moment. Either from prior knowledge or trial-and-error it can know that mining iron ore with hands or wooden tools is insufficient. Obtaining a stone pickaxe requires achieving subgoals such as cutting wood, crafting wooden pickaxe, and mining cobblestone.

In different situations the agent may find a need for different items or for achieving different subgoals such as approaching some point or finding a biome, etc. This differs from a linear order of steps of achieving a predefined goal (such as obtaining a diamond). For a general-purpose agent, such goals can be subgoals of each other and of exploratory behavior. One can say that they should not be represented explicitly with necessity. However, multi-agent collaboration supposes that agents can request concrete things from each other. And if we consider such agents as companions of human players, then posing explicit tasks like “Bring me some coal ore” would be quite useful. It is still disputable if the agent should have an explicit interpretable internal representation of such goals, or if we should train it to achieve goals described in natural language in end-to-end black-box fashion. But at least we can agree that the capability of achieving arbitrary goals defined at runtime is useful (at least for assisting human players).

This capability (in its preliminary form) is demonstrated in the second part of the video. The agent crafts the lever, wooden door, and obtains the iron pickaxe first by achieving a few more subgoals. This result can be considered as a sort of baseline to be improved on. Although we don’t introduce metrics intentionally, we will compare different approaches as examples of this motivation task in the future blog posts.

Future Plans

SingularityNET is designed not only to be valuable now, but also to foster the emergence of increasingly powerful benevolent general intelligence. Achieving this ultimate goal requires a framework applicable to coordination of numerous AI services deployed on the platform, which may require some sort of AI services language.

However, coordinating skills of agents, in particular, in the Minecraft world is not too different. An AGI framework should be useful for any of these and other applications, which is the focus of TrueAGI within the SingularityNET ecosystem. We are developing OpenCog Hyperon as such a platform, and, thus, creating Minecraft agents is considered as one of the motivation tasks for it among several others. The upcoming blog posts will be devoted not only to progress in capabilities of our Minecraft agent, but also to discussion of insights about AGI design.

Join us in a conversation about advances in AI and AGI on the SingularityNET social channels:

Towards General-Purpose Minecraft AI Agents was originally published in SingularityNET on Medium, where people are continuing the conversation by highlighting and responding to this story.