Game Interface
AlphaZero.GameInterface
— ModuleA generic interface for single-player games and two-player zero-sum games.
Stochastic games and intermediate rewards are supported. By convention, rewards are expressed from the point of view of the player called white. In two-player zero-sum games, we call black
the player trying to minimize the reward.
A test suite is provided in the AlphaZero.Scripts
to check the compliance of your environment with this interface.
Mandatory Interface
The game interface of AlphaZero.jl differs from many standard RL interfaces by making a distinction between a game specification and a game environment:
- A specification holds all static information about a game, which does not depend on the current state (e.g. the world dimensions in a grid world environment)
- In contrast, an environment holds information about the current state of the game (e.g. the player's position in a grid-world environment).
Game Specifications
AlphaZero.GameInterface.AbstractGameSpec
— TypeAbstractGameSpec
Abstract type for a game specification.
The specification holds all static information about a game, which does not depend on the current state.
AlphaZero.GameInterface.two_players
— Functiontwo_players(::AbstractGameSpec) :: Bool
Return whether or not a game is a two-players game.
AlphaZero.GameInterface.actions
— Functionactions(::AbstractGameSpec)
Return the vector of all game actions.
AlphaZero.GameInterface.vectorize_state
— Functionvectorize_state(::AbstractGameSpec, state) :: Array{Float32}
Return a vectorized representation of a given state.
Game Environments
AlphaZero.GameInterface.AbstractGameEnv
— TypeAbstractGameEnv
Abstract base type for a game environment.
Intuitively, a game environment holds a game specification and a current state.
AlphaZero.GameInterface.init
— Functioninit(::AbstractGameSpec) :: AbstractGameEnv
Create a new game environment in a (possibly random) initial state.
AlphaZero.GameInterface.spec
— Functionspec(game::AbstractGameEnv) :: AbstractGameSpec
Return the game specification of an environment.
AlphaZero.GameInterface.set_state!
— Functionset_state!(game::AbstractGameEnv, state)
Modify the state of a game environment in place.
AlphaZero.GameInterface.current_state
— Functioncurrent_state(game::AbstractGameEnv)
Return the game state.
The state returned by this function may be stored (e.g. in the MCTS tree) and must therefore either be fresh or persistent. If in doubt, you should make a copy.
AlphaZero.GameInterface.game_terminated
— Functiongame_terminated(::AbstractGameEnv)
Return a boolean indicating whether or not the game is in a terminal state.
AlphaZero.GameInterface.white_playing
— Functionwhite_playing(::AbstractGameEnv) :: Bool
Return true
if white is to play and false
otherwise.
For a one-player game, this function must always return true
.
AlphaZero.GameInterface.actions_mask
— Functionactions_mask(::AbstractGameEnv)
Return a boolean mask indicating what actions are available.
The following identities must hold:
game_terminated(game) || any(actions_mask(game))
length(actions_mask(game)) == length(actions(spec(game)))
AlphaZero.GameInterface.play!
— Functionplay!(game::AbstractGameEnv, action)
Update the game environment by making the current player perform action
. Note that this function does not have to be deterministic.
AlphaZero.GameInterface.white_reward
— Functionwhite_reward(game::AbstractGameEnv)
Return the intermediate reward obtained by the white player after the last transition step. The result is undetermined when called at an initial state.
Optional Interface
Interface for Interactive Tools
These functions are required for the default User Interface to work well.
AlphaZero.GameInterface.action_string
— Functionaction_string(::AbstractGameSpec, action) :: String
Return a human-readable string representing the provided action.
AlphaZero.GameInterface.parse_action
— Functionparse_action(::AbstractGameSpec, str::String)
Return the action described by string str
or nothing
if str
does not denote a valid action.
AlphaZero.GameInterface.read_state
— Functionread_state(game_spec::AbstractGameSpec)
Read a state from the standard input. Return the corresponding state (with type state_type(game_spec)
) or nothing
in case of an invalid input.
AlphaZero.GameInterface.render
— Functionrender(game::AbstractGameEnv)
Print the game state on the standard output.
Other Optional Functions
AlphaZero.GameInterface.heuristic_value
— Functionheuristic_value(game::AbstractGameEnv)
Return a heuristic estimate of the state value for the current player.
The given state must be nonfinal and returned values must belong to the $(-∞, ∞)$ interval.
This function is not needed by AlphaZero but it is useful for building baselines such as minmax players.
AlphaZero.GameInterface.symmetries
— Functionsymmetries(::AbstractGameSpec, state)
Return the vector of all pairs (s, σ)
where:
s
is the image ofstate
by a nonidentical symmetryσ
is the associated actions permutation, as an integer vector of sizenum_actions(game)
.
A default implementation is provided that returns an empty vector.
Note that the current state of the passed environment is ignored by this function.
Example
In the game of tic-tac-toe, there are eight symmetries that can be obtained by composing reflexions and rotations of the board (including the identity symmetry).
Property
If (s2, σ)
is a symmetry for state s1
, then mask2 == mask1[σ]
must hold where mask1
and mask2
are the available action masks for s1
and s2
respectively.
Derived Functions
Operations on Specifications
AlphaZero.GameInterface.state_type
— Functionstate_type(::AbstractGameSpec)
Return the state type associated to a game.
State objects must be persistent or appear as such as they are stored into the MCTS tree without copying. They also have to be comparable and hashable.
AlphaZero.GameInterface.state_dim
— Functionstate_dim(::AbstractGameSpec)
Return a tuple that indicates the shape of a vectorized state representation.
AlphaZero.GameInterface.state_memsize
— Functionstate_memsize(::AbstractGameSpec)
Return the memory footprint occupied by a state of the given game.
The computation is based on a random initial state, assuming that all states have an identical footprint.
AlphaZero.GameInterface.action_type
— Functionaction_type(::AbstractGameSpec)
Return the action type associated to a game.
AlphaZero.GameInterface.num_actions
— Functionnum_actions(::AbstractGameSpec)
Return the total number of actions associated with a game.
AlphaZero.GameInterface.init
— Methodinit(::AbstractGameSpec, state) :: AbstractGameEnv
Create a new game environment, initialized in a given state.
Operations on Environments
AlphaZero.GameInterface.clone
— Functionclone(::AbstractGameEnv)
Return an independent copy of the given environment.
AlphaZero.GameInterface.available_actions
— Functionavailable_actions(::AbstractGameEnv)
Return the vector of all available actions.
AlphaZero.GameInterface.apply_random_symmetry!
— Functionapply_random_symmetry!(::AbstractGameEnv)
Update a game environment by applying a random symmetry to the current state (see symmetries
).
Wrapper for CommonRLInterface.jl
AlphaZero.CommonRLInterfaceWrapper
— ModuleUtilities for using AlphaZero.jl on RL environments that implement CommonRLInterface.jl.
AlphaZero.CommonRLInterfaceWrapper.Env
— TypeEnv(rlenv::CommonRLInterface.AbstractEnv; <kwargs>) <: AbstractGameEnv
Wrap an environment implementing the interface defined in CommonRLInterface.jl into an AbstractGameEnv
.
Requirements
The following optional methods must be implemented for rlenv
:
clone
state
setstate!
valid_action_mask
player
players
Keyword arguments
The following optional functions from GameInterface
are not present in CommonRLInterface.jl and can be provided as keyword arguments:
vectorize_state
: must be provided unless states already have typeArray{<:Number}
heuristic_value
symmetries
render
action_string
parse_action
read_state
If f
is not provided, the default implementation calls GI.f(::CommonRLInterface.AbstractEnv, ...)
.
AlphaZero.CommonRLInterfaceWrapper.Spec
— TypeSpec(rlenv::RL.AbstractEnv; kwargs...) = spec(Env(rlenv; kwargs...))