MCTS
AlphaZero.MCTS
— ModuleA generic, standalone implementation of Monte Carlo Tree Search. It can be used on any game that implements GameInterface
and with any external oracle.
Oracle Interface
An oracle can be any function or callable object.
oracle(state)
evaluates a single state from the current player's perspective and returns a pair (P, V)
where:
P
is a probability vector onGI.available_actions(GI.init(gspec, state))
V
is a scalar estimating the value or win probability for white.
Standard Oracles
AlphaZero.MCTS.RolloutOracle
— TypeMCTS.RolloutOracle(game_spec::AbstractGameSpec, γ=1.) <: Function
This oracle estimates the value of a position by simulating a random game from it (a rollout). Moreover, it puts a uniform prior on available actions. Therefore, it can be used to implement the "vanilla" MCTS algorithm.
Environment
AlphaZero.MCTS.Env
— TypeMCTS.Env(game_spec::AbstractGameSpec, oracle; <keyword args>)
Create and initialize an MCTS environment with a given oracle
.
Keyword Arguments
gamma=1.
: the reward discount factorcpuct=1.
: exploration constant in the UCT formulanoise_ϵ=0., noise_α=1.
: parameters for the dirichlet exploration noise (see below)prior_temperature=1.
: temperature to apply to the oracle's output to get the prior probability vector used by MCTS.
Dirichlet Noise
A naive way to ensure exploration during training is to adopt an ϵ-greedy policy, playing a random move at every turn instead of using the policy prescribed by MCTS.policy
with probability ϵ. The problem with this naive strategy is that it may lead the player to make terrible moves at critical moments, thereby biasing the policy evaluation mechanism.
A superior alternative is to add a random bias to the neural prior for the root node during MCTS exploration: instead of considering the policy $p$ output by the neural network in the UCT formula, one uses $(1-ϵ)p + ϵη$ where $η$ is drawn once per call to MCTS.explore!
from a Dirichlet distribution of parameter $α$.
AlphaZero.MCTS.explore!
— FunctionMCTS.explore!(env, game, nsims)
Run nsims
MCTS simulations from the current state.
AlphaZero.MCTS.policy
— FunctionMCTS.policy(env, game)
Return the recommended stochastic policy on the current state.
A call to this function must always be preceded by a call to MCTS.explore!
.
AlphaZero.MCTS.reset!
— FunctionMCTS.reset!(env)
Empty the MCTS tree.
Profiling Utilities
AlphaZero.MCTS.memory_footprint_per_node
— FunctionMCTS.memory_footprint_per_node(gspec)
Return an estimate of the memory footprint of a single MCTS node for the given game (in bytes).
AlphaZero.MCTS.approximate_memory_footprint
— FunctionMCTS.approximate_memory_footprint(env)
Return an estimate of the memory footprint of the MCTS tree (in bytes).
AlphaZero.MCTS.average_exploration_depth
— FunctionMCTS.average_exploration_depth(env)
Return the average number of nodes that are traversed during an MCTS simulation, not counting the root.