Players

Player Interface

AlphaZero.thinkFunction
think(::AbstractPlayer, game)

Return a probability distribution over actions as a (actions, π) pair.

source
AlphaZero.select_moveFunction
select_move(player::AbstractPlayer, game, turn_number)

Return a single action. A default implementation is provided that samples an action according to the distribution computed by think, with a temperature given by player_temperature.

source
AlphaZero.reset_player!Function
reset_player!(::AbstractPlayer)

Reset the internal memory of a player (e.g. the MCTS tree). The default implementation does nothing.

source
AlphaZero.player_temperatureFunction
player_temperature(::AbstractPlayer, game, turn_number)

Return the player temperature, given the number of actions that have been played before by both players in the current game.

A default implementation is provided that always returns 1.

source

Player Instances

AlphaZero.MctsPlayerType
MctsPlayer{Game, MctsEnv} <: AbstractPlayer{Game}

A player that selects actions using MCTS.

Constructors

MctsPlayer(mcts::MCTS.Env; τ, niters, timeout=nothing)

Construct a player from an MCTS environment. When computing each move:

  • if timeout is provided, MCTS simulations are executed for timeout seconds by groups of niters
  • otherwise, niters MCTS simulations are run

The temperature parameter τ can be either a real number or a AbstractSchedule.

MctsPlayer(oracle::MCTS.Oracle, params::MctsParams; timeout=nothing)

Construct an MCTS player from an oracle and an MctsParams structure.

source
AlphaZero.NetworkPlayerType
NetworkPlayer{Game, Net} <: AbstractPlayer{Game}

A player that uses the policy output by a neural network directly, instead of relying on MCTS. The given neural network must be in test mode.

source
AlphaZero.EpsilonGreedyPlayerType
EpsilonGreedyPlayer{Game, Player} <: AbstractPlayer{Game}

A wrapper on a player that makes it choose a random move with a fixed $ϵ$ probability.

source
AlphaZero.TwoPlayersType
TwoPlayers{Game} <: AbstractPlayer{Game}

If white and black are two AbstractPlayer, then TwoPlayers(white, black) is a player that behaves as white when white is to play and as black when black is to play.

source

Derived Functions

AlphaZero.TraceType
Trace{Game, State}

An object that collects all states visited during a game, along with the rewards obtained at each step and the successive player policies to be used as targets.

Constructor

Trace{Game}(initial_state)
source
Base.push!Method
Base.push!(t::Trace, π, r, s)

Add a (target policy, reward, new state) triple to a trace.

source
AlphaZero.interactive!Function
interactive!(game, white, black)

Launch an interactive session for game::AbstractGame between players white and black. Both players have type AbstractPlayer and one of them is typically Human.

source
AlphaZero.HumanType
Human{Game} <: AbstractPlayer{Game}

Human player that queries the standard input for actions.

Does not implement think but instead implements select_move directly.

source

Distributed Simulator

AlphaZero.SimulatorType
Simulator(make_player, oracles, measure)

A distributed simulator that encapsulates the details of running simulations across multiple threads and multiple machines.

Arguments

- `make_oracles`: a function that takes no argument and returns
   the oracles used by the player, which can be either
  `nothing`, a single oracle or a pair of oracles.
- `make_player`: a function that takes as an argument the `oracles` field
  above and nuild a player from it.
- `measure(trace, colors_flipped, player)`: the function that is used to
  take measurements after each game simulation.
source
AlphaZero.record_traceFunction
record_trace

A measurement function to be passed to a Simulator that produces named tuples with two fields: trace::Trace and colors_flipped::Bool.

source
AlphaZero.ColorPolicyType
@enum ColorPolicy ALTERNATE_COLORS BASELINE_WHITE CONTENDER_WHITE

Policy for attributing colors in a duel between a baseline and a contender.

source
AlphaZero.simulateFunction
simulate(::Simulator; <keyword arguments>)

Play a series of games using a given Simulator.

Keyword Arguments

  • num_games: number of games to play
  • num_workers: number of workers tasks to spawn
  • game_simulated: called every time a game simulation is completed
  • reset_every: if set, players are reset every reset_every games
  • color_policy: either nothing or a ColorPolicy
  • flip_probability=0.: see play_game

Return

Return a vector of objects returned by simulator.measure.

source