Players

Player Interface

AlphaZero.AbstractPlayer — Type

AbstractPlayer{Game}

Abstract type for a player of Game.

source

AlphaZero.think — Function

think(::AbstractPlayer, state, turn=nothing)

Return a probability distribution over actions as a (actions, π) pair.

The turn argument, if provided, indicates the number of actions that have been played before by both players in the current game. It is useful as during self-play, AlphaZero typically drops its temperature parameter after a fixed number of turns.

source

AlphaZero.select_move — Function

select_move(player::AbstractPlayer, state, turn=nothing)

Return a single action. A default implementation is provided that samples an action according to the distribution computed by think.

source

AlphaZero.reset_player! — Function

reset_player!(::AbstractPlayer)

Reset the internal memory of a player (e.g. the MCTS tree). The default implementation does nothing.

source

Player Instances

AlphaZero.MctsPlayer — Type

MctsPlayer{Game, MctsEnv} <: AbstractPlayer{Game}

A player that selects actions using MCTS.

Constructors

MctsPlayer(mcts::MCTS.Env; τ, niters, timeout=nothing)

Construct a player from an MCTS environment. When computing each move:

if timeout is provided, MCTS simulations are executed for timeout seconds by groups of niters
otherwise, niters MCTS simulations are run

The temperature parameter τ can be either a real number or a StepSchedule.

MctsPlayer(oracle::MCTS.Oracle, params::MctsParams; timeout=nothing)

Construct an MCTS player from an oracle and an MctsParams structure. If the oracle is a network, this constructor handles copying it, putting it in test mode and copying it on the GPU (if necessary).

source

AlphaZero.RandomPlayer — Type

RandomPlayer{Game} <: AbstractPlayer{Game}

A player that picks actions uniformly at random.

source

AlphaZero.NetworkPlayer — Type

NetworkPlayer{Game, Net} <: AbstractPlayer{Game}

A player that uses the policy output by a neural network directly, instead of relying on MCTS.

source

AlphaZero.EpsilonGreedyPlayer — Type

EpsilonGreedyPlayer{Game, Player} <: AbstractPlayer{Game}

A wrapper on a player that makes it choose a random move with a fixed $ϵ$ probability.

source

Derived Functions

AlphaZero.play_game — Function

play_game(white, black, memory=nothing)

Play a game between two AbstractPlayer and return the reward obtained by white.

If the memory argument is provided, samples are automatically collected from this game in the given MemoryBuffer.
If the flip_probability argument is set to $p$, the board is flipped randomly at every turn with probability $p$, using GI.random_symmetric_state.

source

AlphaZero.pit — Function

pit(handler, contender, baseline, ngames)

Evaluate two AbstractPlayer against each other in a series of games.

Arguments

handler: this function is called after each simulated game with two arguments: the game number i and the collected reward z for the contender player
ngames: number of games to play

Optional keyword arguments

reset_every: if set, players are reset every reset_every games
color_policy: determines the ColorPolicy, which is ALTERNATE_COLORS by default
memory=nothing: memory to use to record samples
flip_probability=0.: see play_game

source

AlphaZero.ColorPolicy — Type

@enum ColorPolicy ALTERNATE_COLORS BASELINE_WHITE CONTENDER_WHITE

Policy for attributing colors in a duel between a baseline and a contender.

source

AlphaZero.interactive! — Function

interactive!(game, white, black)

Launch an interactive session for game::AbstractGame between players white and black. Both players have type AbstractPlayer and one of them is typically Human.

source

AlphaZero.Human — Type

Human{Game} <: AbstractPlayer{Game}

Human player that queries the standard input for actions.

Does not implement think but instead implements select_move directly.

source