This package provides a generic, simple and fast implementation of Deepmind's AlphaZero algorithm:
- The core algorithm is only 2,000 lines of pure, hackable Julia code.
- Generic interfaces make it easy to add support for new games or new learning frameworks.
- Being between one and two orders of magnitude faster than its Python alternatives, this implementation enables solving nontrivial games on a standard desktop computer with a GPU.
Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind's AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. We believe that this methodology can have exciting applications in many different research areas.
Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++) and optimized for highly distributed computing environments. This makes them hardly accessible for students, researchers and hackers.
Many simple Python implementations can be found on Github, but none of them is able to beat a reasonable baseline on games such as Othello or Connect Four. As an illustration, the benchmark in the README of the most popular of them only features a random baseline, along with a greedy baseline that does not appear to be significantly stronger.
AlphaZero.jl is designed to be as simple as those Python implementations. In addition, it is between one and two orders of magnitude faster, making it possible to solve nontrivial games on a standard desktop computer with a GPU. This gain comes mostly from two sources:
- Julia's inherent speed: most machine learning algorithms do not suffer much from being written in Python as most of the computation happens within heavily optimized matrix manipulation routines. This is not the case with AlphaZero, where tree search is also a possible bottleneck.
- An asynchronous MCTS implementation: even more importantly, a key aspect in making MCTS scale is to enable several workers to explore the search tree asynchronously. This is a huge win even on a single machine, as it enables to perform neural-network inference on large batches rather than evaluating board positions separately, thereby maximizing the GPU utilization (see benchmark).
This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-18-C-0092. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.