Introducing MuZero

A PyTorch implementation of DeepMind's MuZero agent to do planning with a learned model

By Michael Hu
July 6, 2022 10:00 pm
2 min read

We are excited to introduce our most recent project MuZero, an open-source implementation of DeepMind's MuZero algorithm [1], an advancement over the famous AlphaZero algorithm [2]. In contrast to the AlphaZero agent, which is limited to turn-based, two-player, zero-sum games, the MuZero agent can also play single-player games, such as Atari games. In addition, the MuZero agent uses a learned model, thereby relaxing the requirements and making it suitable for addressing more real-world problems.

The project was implemented in PyTorch and provides comprehensive support for training, monitoring. For those interested, the source code for the project can be found in this repository.

MuZero agent on Tic-Tac-Toe
Figure 1: Training statistics of MuZero agent on Tic-Tac-Toe, a classic turn-based, two-player, zero-sum game.
MuZero agent on CartPole
Figure 2: Training statistics of MuZero agent on CartPole classic control task.

References

  • [1]

    Schrittwieser, Julian, Antonoglou, Ioannis, Hubert, Thomas, Simonyan, Karen, Sifre, Laurent, Schmitt, Simon, Guez, Arthur, Lockhart, Edward, Hassabis, Demis, Graepel, Thore, Lillicrap, Timothy, Silver, David. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588 (604–609), dec 2020. http://dx.doi.org/10.1038/s41586-020-03051-4

  • [2]

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815, 2017.