Benchmark

AlphaZero.Benchmark — Module

Utilities to evaluate players against one another.

Typically, between each training iteration, different players that possibly depend on the current neural network compete against a set of baselines.

source

Evaluations

AlphaZero.Benchmark.Evaluation — Type

Evaluation

Abstract type for a benchmark item specification.

source

AlphaZero.Benchmark.Single — Type

Single <: Evaluation

Evaluating a single player in a one-player game.

source

AlphaZero.Benchmark.Duel — Type

Duel <: Evaluation

Evaluating a player by pitting it against a baseline player in a two-player game.

source

AlphaZero.Benchmark.run — Function

Benchmark.run(env::Env, duel::Benchmark.Evaluation, progress=nothing)

Run a benchmark duel and return a Report.Evaluation.

If a progress is provided, next!(progress) is called after each simulated game.

source

Players

AlphaZero.Benchmark.Player — Type

Benchmark.Player

Abstract type to specify a player that can be featured in a benchmark duel.

Subtypes must implement the following functions:

Benchmark.instantiate(player, nn): instantiate the player specification into an AbstractPlayer given a neural network
Benchmark.name(player): return a String describing the player

source

AlphaZero.Benchmark.Full — Type

Benchmark.Full(params) <: Benchmark.Player

Full AlphaZero player that combines MCTS with the learnt network.

Argument params has type MctsParams.

source

AlphaZero.Benchmark.NetworkOnly — Type

Benchmark.NetworkOnly(;τ=1.0) <: Benchmark.Player

Player that uses the policy output by the learnt network directly, instead of relying on MCTS.

source

AlphaZero.Benchmark.MctsRollouts — Type

Benchmark.MctsRollouts(params) <: Benchmark.Player

Pure MCTS baseline that uses rollouts to evaluate new positions.

Argument params has type MctsParams.

source

AlphaZero.Benchmark.MinMaxTS — Type

Benchmark.MinMaxTS(;depth, τ=0.) <: Benchmark.Player

Minmax baseline, which relies on MinMax.Player.

source

Minmax Baseline

AlphaZero.MinMax — Module

A simple implementation of the minmax tree search algorithm, to be used as a baseline against AlphaZero. Heuristic board values are provided by the GameInterface.heuristic_value function.

source

AlphaZero.MinMax.Player — Type

MinMax.Player <: AbstractPlayer

A stochastic minmax player, to be used as a baseline.

MinMax.Player(;depth, amplify_rewards, τ=0.)

The minmax player explores the game tree exhaustively at depth depth to build an estimate of the Q-value of each available action. Then, it chooses an action as follows:

If there are winning moves (with value Inf), one of them is picked uniformly at random.
If all moves are losing (with value -Inf), one of them is picked uniformly at random.

Otherwise,

If the temperature τ is zero, a move is picked uniformly among those with maximal Q-value (there is usually only one choice).
If the temperature τ is nonzero, the probability of choosing action $a$ is proportional to $e^{\frac{q_a}{Cτ}}$ where $q_a$ is the Q value of action $a$ and $C$ is the maximum absolute value of all finite Q values, making the decision invariant to rescaling of GameInterface.heuristic_value.

If the amplify_rewards option is set to true, every received positive reward is converted to $∞$ and every negative reward is converted to $-∞$.

source