Utilities to evaluate players against one another.

Typically, between each training iteration, different players that possibly depend on the current neural network compete against a set of baselines.



AlphaZero.Benchmark.runFunction, duel::Benchmark.Evaluation, progress=nothing)

Run a benchmark duel and return a Report.Evaluation.

If a progress is provided, next!(progress) is called after each simulated game.




Abstract type to specify a player that can be featured in a benchmark duel.

Subtypes must implement the following functions:

  • Benchmark.instantiate(player, nn): instantiate the player specification into an AbstractPlayer given a neural network
  • return a String describing the player
Benchmark.NetworkOnly(;τ=1.0) <: Benchmark.Player

Player that uses the policy output by the learnt network directly, instead of relying on MCTS.


Minmax Baseline

MinMax.Player <: AbstractPlayer

A stochastic minmax player, to be used as a baseline.

MinMax.Player(;depth, amplify_rewards, τ=0.)

The minmax player explores the game tree exhaustively at depth depth to build an estimate of the Q-value of each available action. Then, it chooses an action as follows:

  • If there are winning moves (with value Inf), one of them is picked uniformly at random.
  • If all moves are losing (with value -Inf), one of them is picked uniformly at random.


  • If the temperature τ is zero, a move is picked uniformly among those with maximal Q-value (there is usually only one choice).
  • If the temperature τ is nonzero, the probability of choosing action $a$ is proportional to $e^{\frac{q_a}{Cτ}}$ where $q_a$ is the Q value of action $a$ and $C$ is the maximum absolute value of all finite Q values, making the decision invariant to rescaling of GameInterface.heuristic_value.

If the amplify_rewards option is set to true, every received positive reward is converted to $∞$ and every negative reward is converted to $-∞$.