Introducing AlphaZero

A PyTorch implementation of DeepMind's AlphaZero agent to play two-player, zero-sum strategy board games like Go and Gomoku.

By Michael Hu
June 14, 2022 10:30 am
2 min read

We are excited to introduce our most recent project AlphaZero, an open-source implementation of DeepMind's groundbreaking AlphaZero algorithm [1]. AlphaZero is an advancement over the initial AlphaGo algorithm [2], which defeated the world best players in Go.

The project was implemented in PyTorch and provides comprehensive support for training, monitoring, and analyzing the AlphaZero agent specifically tailored for Go and Free-style Gomoku board games. We hope it can mark a noteworthy achievement in this field.

AlphaZero agent on a 9x9 Go board
Figure 1: Training statistics of AlphaZero agent on a 9x9 Go board

[UPDATE 2023-12-23]: The project is now part of my new book The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementation with Python

References

  • [1]

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815, 2017.

  • [2]

    Silver, David, Huang, Aja, Maddison, Chris J., Guez, Arthur, Sifre, Laurent, van den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, Dieleman, Sander, Grewe, Dominik, Nham, John, Kalchbrenner, Nal, Sutskever, Ilya, Lillicrap, Timothy, Leach, Madeleine, Kavukcuoglu, Koray, Graepel, Thore, Hassabis, Demis. Mastering the game of Go with deep neural networks and tree search. Nature, 529 (484--489), 10.1038/nature16961, 2016.