Players and Simulations
Player Interface
AlphaZero.AbstractPlayer
— TypeAbstractPlayer
Abstract type for a game player.
AlphaZero.think
— Functionthink(::AbstractPlayer, game)
Return a probability distribution over available actions as a (actions, π)
pair.
AlphaZero.select_move
— Functionselect_move(player::AbstractPlayer, game, turn_number)
Return a single action. A default implementation is provided that samples an action according to the distribution computed by think
, with a temperature given by player_temperature
.
AlphaZero.reset_player!
— Functionreset_player!(::AbstractPlayer)
Reset the internal memory of a player (e.g. the MCTS tree). The default implementation does nothing.
AlphaZero.player_temperature
— Functionplayer_temperature(::AbstractPlayer, game, turn_number)
Return the player temperature, given the number of actions that have been played before by both players in the current game.
A default implementation is provided that always returns 1.
Player Instances
AlphaZero.AlphaZeroPlayer
— FunctionAlphaZeroPlayer(::Env; [timeout, mcts_params, use_gpu])
Create an AlphaZero player from the current training environment.
Note that the returned player may be slow as it does not batch MCTS requests.
AlphaZero.MctsPlayer
— TypeMctsPlayer{MctsEnv} <: AbstractPlayer
A player that selects actions using MCTS.
Constructors
MctsPlayer(mcts::MCTS.Env; τ, niters, timeout=nothing)
Construct a player from an MCTS environment. When computing each move:
- if
timeout
is provided, MCTS simulations are executed fortimeout
seconds by groups ofniters
- otherwise,
niters
MCTS simulations are run
The temperature parameter τ
can be either a real number or a AbstractSchedule
.
MctsPlayer(game_spec::AbstractGameSpec, oracle,
params::MctsParams; timeout=nothing)
Construct an MCTS player from an oracle and an MctsParams
structure.
AlphaZero.RandomPlayer
— TypeRandomPlayer <: AbstractPlayer
A player that picks actions uniformly at random.
AlphaZero.NetworkPlayer
— TypeNetworkPlayer{Net} <: AbstractPlayer
A player that uses the policy output by a neural network directly, instead of relying on MCTS. The given neural network must be in test mode.
AlphaZero.EpsilonGreedyPlayer
— TypeEpsilonGreedyPlayer{Player} <: AbstractPlayer
A wrapper on a player that makes it choose a random move with a fixed $ϵ$ probability.
AlphaZero.PlayerWithTemperature
— TypePlayerWithTemperature{Player} <: AbstractPlayer
A wrapper on a player that enables overwriting the temperature schedule.
AlphaZero.TwoPlayers
— TypeTwoPlayers <: AbstractPlayer
If white
and black
are two AbstractPlayer
, then TwoPlayers(white, black)
is a player that behaves as white
when white
is to play and as black
when black
is to play.
Game Simulations
Simulation traces
AlphaZero.Trace
— TypeTrace{State}
An object that collects all states visited during a game, along with the rewards obtained at each step and the successive player policies to be used as targets for the neural network.
Constructor
Trace(initial_state)
Base.push!
— MethodBase.push!(t::Trace, π, r, s)
Add a (target policy, reward, new state) quadruple to a trace.
Playing a single game
AlphaZero.play_game
— Functionplay_game(gspec::AbstractGameSpec, player; flip_probability=0.) :: Trace
Simulate a game by an AbstractPlayer
.
- For two-player games, please use
TwoPlayers
. - If the
flip_probability
argument is set to $p$, the board is flipped randomly at every turn with probability $p$, usingGI.apply_random_symmetry!
.
Playing multiple games in a distibuted fashion
AlphaZero.Simulator
— TypeSimulator(make_player, make_oracles, measure)
A distributed simulator that encapsulates the details of running simulations across multiple threads and multiple machines.
Arguments
make_oracles
: a function that takes no argument and returns the oracles used by the player, which can be eithernothing
, a single oracle or a pair of oracles.make_player
: a function that takes as an argument the result ofmake_oracles
and builds a player from it. In practice, an oracle returned bymake_oracles
may be replaced by aBatchedOracle
before it is passed tomake_player
, which is why these two functions are specified separately.measure(trace, colors_flipped, player)
: the function that is used to take measurements after each game simulation.
AlphaZero.record_trace
— Functionrecord_trace
A measurement function to be passed to a Simulator
that produces named tuples with two fields: trace::Trace
and colors_flipped::Bool
.
AlphaZero.simulate
— Functionsimulate(::Simulator, ::AbstractGameSpec; ::SimParams; <kwargs>)
Play a series of games using a given Simulator
.
Keyword Arguments
game_simulated
is called every time a game simulation is completed (with no arguments)
Return
Return a vector of objects computed by simulator.measure
.
AlphaZero.simulate_distributed
— Functionsimulate_distributed(::Simulator, ::AbstractGameSpec, ::SimParams; <kwargs>)
Identical to simulate
but splits the work across all available distributed workers, whose number is given by Distributed.nworkers()
.
Utilities for playing interactive games
AlphaZero.Human
— TypeHuman <: AbstractPlayer
Human player that queries the standard input for actions.
Does not implement think
but instead implements select_move
directly.
AlphaZero.interactive!
— Functioninteractive!(game)
interactive!(gspec)
interactive!(game, player)
interactive!(gspec, player)
interactive!(game, white, black)
interactive!(gspec, white, black)
Launch a possibly interactive game session.
This function takes either an AbstractGameSpec
or AbstractGameEnv
as an argument.