DeepMind’s New AI Is Able To Learn The Rules Of A Game As It Plays

As such, DeepMind’s new AI represents a notable improvement over previous AI algorithms that learn to play games via reinforcement learning.

DeepMind’s New AI Is Able To Learn The Rules Of A Game As It Plays

By Daniel Nelson

Alphabet’s subsidiary DeepMind has recently developed an AI system capable of learning the rules of a game as it plays.

While DeepMind has created impressive AI models that can master games like Chess, Shogi, Go, and video games before, these models must be provided with the rules of the game beforehand. As such, DeepMind’s new AI represents a notable improvement over previous AI algorithms that learn to play games via reinforcement learning.

In a paper recently published in the journal Nature, DeepMind detailed how their new AI system operates. The new AI, dubbed MuZero, has accomplished is able to learn the rules of a game as it plays thanks to a principle called “look-ahead search”. 

As reported by Engadget, MuZero uses look-ahead search to determine which moves should be executed based on the most likely responses from opponents.

When considering all the possible moves that could be made in games like chess,  MuZero is able to prioritize, narrowing moves down to just the most likely and relevant moves. MuZero will then learn from both successful and unsuccessful maneuvers.

Rather than model all possible factors, it only considers factors that are most relevant to the decision at hand. MuZero basically takes the myriad of potential variables that can be considered and distill them down to just the most salient, impactful features.

These features are represented in a tree-based search algorithm. The possibilities within the tree are then combined with a learned model based on the features of the test environment. The look-ahead search is carried out after the most relevant aspects of an environment have been identified.

In order to come to a final decision, three factors are considered. MuZero considers the outcome of the previous choice, the current position it occupies, and the potential actions that it can take next.

This approach beats out approaches previously used by DeepMind’s, including basic look-ahead search and tree-based models. MuZero proved to be at least as good at chess, shogi, and Go as AlphaZero was.

When MuZero played the game Ms. Pac-Man, MuZero was only able to consider around six or seven moves at a time. Despite this limit, the AI was still able to perform quite well. DeepMind also experimented with MuZero’s capabilities by limiting the number of simulations it could complete before it had to commit to a move. In general, the more time the program was given to consider possible moves, the better it performed.

The principal research scientist at DeepMind, David Silver, explained via TechXplore that MuZero is the first AI model able to generate its own representation of the rules of an environment, using that representation to plan out actions:

“For the first time, we actually have a system that is able to build its own understanding of how the world works and use that understanding to do this kind of sophisticated look-ahead planning that you’ve previously seen for games like chess.

(MuZero) can start from nothing, and just through trial and error, both discover the rules of the world and use those rules to achieve kind of superhuman performance.”

An AI that is genuinely able to learn the constraints of a task and operate within those constraints has a wide variety of possible applications. MuZero could be used for tasks like video compression, which has historically been difficult to automate using AI, owing to the many different, possible video formats and compression modes.

MuZero was able to achieve an approximately 5% compression improvement. This could have implications for the large number of videos hosted by Google and YouTube. Beyond videos, DeepMind is also looking into using the same MuZero techniques for protein architecture design and robotics programming.

According to Wendy Hall, professor of Computer Science at the University of Southampton, MuZero represents “a significant step forward” for reinforcement learning algorithms.

However, Hall is concerned that the algorithms could be misused. For instance, the US Air Force has already referenced early research papers covering MuZero to create AI system that could launch missiles from U-2 spy planes.

This is despite DeepMind’s researchers expressed their opposition to using their algorithms for any deadly weapon, signing the Lethal Autonomous Weapons Pledge to argue that any deadly technology should stay under human control.

Silver explained that DeepMind is looking ahead to the future, aiming to develop algorithms as powerful and versatile as the brain. The first step into creating versatile, flexible algorithms is to understand what it means for a system to be intelligent, and intelligence is linked with an ability to discern the patterns and rules of a complex environment.

Originally published at Unite.Ai