Players

Player Interface

AlphaZero.AbstractPlayer — Type

AbstractPlayer{Game}

Abstract type for a player of Game.

source

AlphaZero.think — Function

think(::AbstractPlayer, game)

Return a probability distribution over actions as a (actions, π) pair.

source

AlphaZero.select_move — Function

select_move(player::AbstractPlayer, game, turn_number)

Return a single action. A default implementation is provided that samples an action according to the distribution computed by think, with a temperature given by player_temperature.

source

AlphaZero.reset_player! — Function

reset_player!(::AbstractPlayer)

Reset the internal memory of a player (e.g. the MCTS tree). The default implementation does nothing.

source

AlphaZero.player_temperature — Function

player_temperature(::AbstractPlayer, game, turn_number)

Return the player temperature, given the number of actions that have been played before by both players in the current game.

A default implementation is provided that always returns 1.

source

Player Instances

AlphaZero.MctsPlayer — Type

MctsPlayer{Game, MctsEnv} <: AbstractPlayer{Game}

A player that selects actions using MCTS.

Constructors

MctsPlayer(mcts::MCTS.Env; τ, niters, timeout=nothing)

Construct a player from an MCTS environment. When computing each move:

if timeout is provided, MCTS simulations are executed for timeout seconds by groups of niters
otherwise, niters MCTS simulations are run

The temperature parameter τ can be either a real number or a AbstractSchedule.

MctsPlayer(oracle::MCTS.Oracle, params::MctsParams; timeout=nothing)

Construct an MCTS player from an oracle and an MctsParams structure. If the oracle is a network, this constructor handles copying it, putting it in test mode and copying it on the GPU (if necessary).

source

AlphaZero.RandomPlayer — Type

RandomPlayer{Game} <: AbstractPlayer{Game}

A player that picks actions uniformly at random.

source

AlphaZero.NetworkPlayer — Type

NetworkPlayer{Game, Net} <: AbstractPlayer{Game}

A player that uses the policy output by a neural network directly, instead of relying on MCTS.

source

AlphaZero.EpsilonGreedyPlayer — Type

EpsilonGreedyPlayer{Game, Player} <: AbstractPlayer{Game}

A wrapper on a player that makes it choose a random move with a fixed $ϵ$ probability.

source

AlphaZero.PlayerWithTemperature — Type

PlayerWithTemperature{Game, Player} <: AbstractPlayer{Game}

A wrapper on a player that enables overwriting the temperature schedule.

source

AlphaZero.TwoPlayers — Type

TwoPlayers{Game} <: AbstractPlayer{Game}

If white and black are two AbstractPlayer, then TwoPlayers(white, black) is a player that behaves as white when white is to play and as black when black is to play.

source

Derived Functions

AlphaZero.play_game — Function

play_game(player; flip_probability=0.) :: Trace

Simulate a game by an AbstractPlayer and return a trace.

For two-player games, please use TwoPlayers.
If the flip_probability argument is set to $p$, the board is flipped randomly at every turn with probability $p$, using GI.apply_random_symmetry.

source

AlphaZero.Trace — Type

Trace{Game, State}

An object that collects all states visited during a game, along with the rewards obtained at each step and the successive player policies to be used as targets.

Constructor

Trace{Game}(initial_state)

source

Base.push! — Method

Base.push!(t::Trace, π, r, s)

Add a (target policy, reward, new state) triple to a trace.

source

AlphaZero.pit — Function

pit(handler, contender, baseline, ngames)

Evaluate two AbstractPlayer against each other in a series of games.

Note that this function can only be used with two-player games.

Arguments

handler: this function is called after each simulated game with three arguments: the game number i, the reward r for the contender player and the trace t
ngames: number of games to play

Optional keyword arguments

reset_every: if set, players are reset every reset_every games
color_policy: determines the ColorPolicy, which is ALTERNATE_COLORS by default
flip_probability=0.: see play_game

source

AlphaZero.ColorPolicy — Type

@enum ColorPolicy ALTERNATE_COLORS BASELINE_WHITE CONTENDER_WHITE

Policy for attributing colors in a duel between a baseline and a contender.

source

AlphaZero.interactive! — Function

interactive!(game, white, black)

Launch an interactive session for game::AbstractGame between players white and black. Both players have type AbstractPlayer and one of them is typically Human.

source

AlphaZero.Human — Type

Human{Game} <: AbstractPlayer{Game}

Human player that queries the standard input for actions.

Does not implement think but instead implements select_move directly.

source