Players
Player Interface
AlphaZero.AbstractPlayer
— TypeAbstractPlayer{Game}
Abstract type for a player of Game
.
AlphaZero.think
— Functionthink(::AbstractPlayer, state, turn=nothing)
Return a probability distribution over actions as a (actions, π)
pair.
The turn
argument, if provided, indicates the number of actions that have been played before by both players in the current game. It is useful as during self-play, AlphaZero typically drops its temperature parameter after a fixed number of turns.
AlphaZero.select_move
— Functionselect_move(player::AbstractPlayer, state, turn=nothing)
Return a single action. A default implementation is provided that samples an action according to the distribution computed by think
.
AlphaZero.reset_player!
— Functionreset_player!(::AbstractPlayer)
Reset the internal memory of a player (e.g. the MCTS tree). The default implementation does nothing.
Player Instances
AlphaZero.MctsPlayer
— TypeMctsPlayer{Game, MctsEnv} <: AbstractPlayer{Game}
A player that selects actions using MCTS.
Constructors
MctsPlayer(mcts::MCTS.Env; τ, niters, timeout=nothing)
Construct a player from an MCTS environment. When computing each move:
- if
timeout
is provided, MCTS simulations are executed fortimeout
seconds by groups ofniters
- otherwise,
niters
MCTS simulations are run
The temperature parameter τ
can be either a real number or a StepSchedule
.
MctsPlayer(oracle::MCTS.Oracle, params::MctsParams; timeout=nothing)
Construct an MCTS player from an oracle and an MctsParams
structure. If the oracle is a network, this constructor handles copying it, putting it in test mode and copying it on the GPU (if necessary).
AlphaZero.RandomPlayer
— TypeRandomPlayer{Game} <: AbstractPlayer{Game}
A player that picks actions uniformly at random.
AlphaZero.NetworkPlayer
— TypeNetworkPlayer{Game, Net} <: AbstractPlayer{Game}
A player that uses the policy output by a neural network directly, instead of relying on MCTS.
AlphaZero.EpsilonGreedyPlayer
— TypeEpsilonGreedyPlayer{Game, Player} <: AbstractPlayer{Game}
A wrapper on a player that makes it choose a random move with a fixed $ϵ$ probability.
Derived Functions
AlphaZero.play_game
— Functionplay_game(white, black, memory=nothing)
Play a game between two AbstractPlayer
and return the reward obtained by white
.
- If the
memory
argument is provided, samples are automatically collected from this game in the givenMemoryBuffer
. - If the
flip_probability
argument is set to $p$, the board is flipped randomly at every turn with probability $p$, usingGI.random_symmetric_state
.
AlphaZero.pit
— Functionpit(handler, contender, baseline, ngames)
Evaluate two AbstractPlayer
against each other in a series of games.
Arguments
handler
: this function is called after each simulated game with two arguments: the game numberi
and the collected rewardz
for the contender playerngames
: number of games to play
Optional keyword arguments
reset_every
: if set, players are reset everyreset_every
gamescolor_policy
: determines theColorPolicy
, which isALTERNATE_COLORS
by defaultmemory=nothing
: memory to use to record samplesflip_probability=0.
: seeplay_game
AlphaZero.ColorPolicy
— Type@enum ColorPolicy ALTERNATE_COLORS BASELINE_WHITE CONTENDER_WHITE
Policy for attributing colors in a duel between a baseline and a contender.
AlphaZero.interactive!
— Functioninteractive!(game, white, black)
Launch an interactive session for game::AbstractGame
between players white
and black
. Both players have type AbstractPlayer
and one of them is typically Human
.
AlphaZero.Human
— TypeHuman{Game} <: AbstractPlayer{Game}
Human player that queries the standard input for actions.
Does not implement think
but instead implements select_move
directly.