Benchmark
AlphaZero.Benchmark — ModuleUtilities to evaluate players against one another.
Typically, between each training iteration, different players that possibly depend on the current neural network compete against a set of baselines.
AlphaZero.Benchmark.Report — TypeBenchmark.Report = Vector{Benchmark.DuelOutcome}A benchmark report is a vector of Benchmark.DuelOutcome objects.
Duels
AlphaZero.Benchmark.Duel — TypeBenchmark.Duel(player, baseline; num_games)Specify a duel that consists in num_games games between player and baseline, each of them of type Benchmark.Player.
Optional keyword arguments
reset_every: if set, the MCTS tree is reset everyreset_mcts_everygames to avoid running out of memorycolor_policyhas typeColorPolicyand isALTERNATE_COLORSby default
AlphaZero.Benchmark.DuelOutcome — TypeBenchmark.DuelOutcomeThe outcome of a duel between two players.
Fields
playerandbaselineareStringfields containing the names of both players involved in the duelavgzis the average reward collected byplayerredundancyis the ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.rewardsis a vector containing all rewards collected byplayer(one per game played)timeis the computing time spent running the duel, in seconds
AlphaZero.Benchmark.run — FunctionBenchmark.run(env::Env, duel::Benchmark.Duel, progress=nothing)Run a benchmark duel and return a Benchmark.DuelOutcome.
If a progress is provided, next!(progress) is called after each simulated game.
Players
AlphaZero.Benchmark.Player — TypeBenchmark.PlayerAbstract type to specify a player that can be featured in a benchmark duel.
Subtypes must implement the following functions:
Benchmark.instantiate(player, nn): instantiate the player specification into anAbstractPlayergiven a neural networkBenchmark.name(player): return aStringdescribing the player
AlphaZero.Benchmark.Full — TypeBenchmark.Full(params) <: Benchmark.PlayerFull AlphaZero player that combines MCTS with the learnt network.
Argument params has type MctsParams.
AlphaZero.Benchmark.NetworkOnly — TypeBenchmark.NetworkOnly(;use_gpu=true, τ=1.0) <: Benchmark.PlayerPlayer that uses the policy output by the learnt network directly, instead of relying on MCTS.
AlphaZero.Benchmark.MctsRollouts — TypeBenchmark.MctsRollouts(params) <: Benchmark.PlayerPure MCTS baseline that uses rollouts to evaluate new positions.
Argument params has type MctsParams.
AlphaZero.Benchmark.MinMaxTS — TypeBenchmark.MinMaxTS(;depth, τ=0.) <: Benchmark.PlayerMinmax baseline, which relies on MinMax.Player.
AlphaZero.Benchmark.Solver — TypeBenchmark.Solver(;ϵ) <: Benchmark.PlayerPerfect solver that plays randomly with probability ϵ.
Minmax Baseline
AlphaZero.MinMax — ModuleA simple implementation of the minmax tree search algorithm, to be used as a baseline against AlphaZero. Heuristic board values are provided by the GameInterface.heuristic_value function.
AlphaZero.MinMax.Player — TypeMinMax.Player{Game} <: AbstractPlayer{Game}A stochastic minmax player, to be used as a baseline.
MinMax.Player{Game}(;depth, amplify_rewards, τ=0.)The minmax player explores the game tree exhaustively at depth depth to build an estimate of the Q-value of each available action. Then, it chooses an action as follows:
- If there are winning moves (with value
Inf), one of them is picked uniformly at random. - If all moves are losing (with value
-Inf), one of them is picked uniformly at random.
Otherwise,
- If the temperature
τis zero, a move is picked uniformly among those with maximal Q-value (there is usually only one choice). - If the temperature
τis nonzero, the probability of choosing action $a$ is proportional to $e^{\frac{q_a}{Cτ}}$ where $q_a$ is the Q value of action $a$ and $C$ is the maximum absolute value of all finite Q values, making the decision invariant to rescaling ofGameInterface.heuristic_value.
If the amplify_rewards option is set to true, every received positive reward is converted to $∞$ and every negative reward is converted to $-∞$.