Benchmark
AlphaZero.Benchmark
— ModuleUtilities to evaluate players against one another.
Typically, between each training iteration, different players that possibly depend on the current neural network compete against a set of baselines.
AlphaZero.Benchmark.Report
— TypeBenchmark.Report = Vector{Benchmark.DuelOutcome}
A benchmark report is a vector of Benchmark.DuelOutcome
objects.
Duels
AlphaZero.Benchmark.Duel
— TypeBenchmark.Duel(player, baseline; num_games)
Specify a duel that consists in num_games
games between player
and baseline
, each of them of type Benchmark.Player
.
Optional keyword arguments
reset_every
: if set, the MCTS tree is reset everyreset_mcts_every
games to avoid running out of memorycolor_policy
has typeColorPolicy
and isALTERNATE_COLORS
by default
AlphaZero.Benchmark.DuelOutcome
— TypeBenchmark.DuelOutcome
The outcome of a duel between two players.
Fields
player
andbaseline
areString
fields containing the names of both players involved in the duelavgz
is the average reward collected byplayer
redundancy
is the ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.rewards
is a vector containing all rewards collected byplayer
(one per game played)time
is the computing time spent running the duel, in seconds
AlphaZero.Benchmark.run
— FunctionBenchmark.run(env::Env, duel::Benchmark.Duel, progress=nothing)
Run a benchmark duel and return a Benchmark.DuelOutcome
.
If a progress
is provided, next!(progress)
is called after each simulated game.
Players
AlphaZero.Benchmark.Player
— TypeBenchmark.Player
Abstract type to specify a player that can be featured in a benchmark duel.
Subtypes must implement the following functions:
Benchmark.instantiate(player, nn)
: instantiate the player specification into anAbstractPlayer
given a neural networkBenchmark.name(player)
: return aString
describing the player
AlphaZero.Benchmark.Full
— TypeBenchmark.Full(params) <: Benchmark.Player
Full AlphaZero player that combines MCTS with the learnt network.
Argument params
has type MctsParams
.
AlphaZero.Benchmark.NetworkOnly
— TypeBenchmark.NetworkOnly(;use_gpu=true) <: Benchmark.Player
Player that uses the policy output by the learnt network directly, instead of relying on MCTS.
AlphaZero.Benchmark.MctsRollouts
— TypeBenchmark.MctsRollouts(params) <: Benchmark.Player
Pure MCTS baseline that uses rollouts to evaluate new positions.
Argument params
has type MctsParams
.
AlphaZero.Benchmark.MinMaxTS
— TypeBenchmark.MinMaxTS(;depth, τ=0.) <: Benchmark.Player
Minmax baseline, which relies on MinMax.Player
.
AlphaZero.Benchmark.Solver
— TypeBenchmark.Solver(;ϵ) <: Benchmark.Player
Perfect solver that plays randomly with probability ϵ
.
Minmax Baseline
AlphaZero.MinMax
— ModuleA simple implementation of the minmax tree search algorithm, to be used as a baseline against AlphaZero. Heuristic board values are provided by the GameInterface.heuristic_value
function.
AlphaZero.MinMax.Player
— TypeMinMax.Player{Game} <: AbstractPlayer{Game}
A stochastic minmax player, to be used as a baseline.
MinMax.Player{Game}(;depth, τ=0.)
The minmax player explores the game tree exhaustively at depth depth
to build an estimate of the Q-value of each available action. Then, it chooses an action as follows:
- If there are winning moves (with value
Inf
), one of them is picked uniformly at random. - If all moves are losing (with value
-Inf
), one of them is picked uniformly at random.
Otherwise,
- If the temperature
τ
is zero, a move is picked uniformly among those with maximal Q-value (there is usually only one choice). - If the temperature
τ
is nonzero, the probability of choosing action $a$ is proportional to $e^{\frac{q_a}{Cτ}}$ where $q_a$ is the Q value of action $a$ and $C$ is the maximum absolute value of all finite Q values, making the decision invariant to rescaling ofGameInterface.heuristic_value
.