MCTS

AlphaZero.MCTS — Module

A generic, standalone implementation of Monte Carlo Tree Search. It can be used on any game that implements GameInterface and with any external oracle.

Both a synchronous and an asynchronous version are implemented, which share most of their code. When browsing the sources for the first time, we recommend that you study the sychronous version first.

source

Oracles

AlphaZero.MCTS.Oracle — Type

MCTS.Oracle{Game}

Abstract base type for an oracle. Oracles must implement MCTS.evaluate and MCTS.evaluate_batch.

source

AlphaZero.MCTS.evaluate — Function

MCTS.evaluate(oracle::Oracle, state)

Evaluate a single state from the current player's perspective.

Return a pair (P, V) where:

P is a probability vector on GI.available_actions(Game(state))
V is a scalar estimating the value or win probability for white.

source

AlphaZero.MCTS.evaluate_batch — Function

MCTS.evaluate_batch(oracle::Oracle, states)

Evaluate a batch of states.

Expect a vector of states and return a vector of (P, V) pairs.

A default implementation is provided that calls MCTS.evaluate sequentially on each position.

source

AlphaZero.MCTS.RolloutOracle — Type

MCTS.RolloutOracle{Game}(γ=1.) <: MCTS.Oracle{Game}

This oracle estimates the value of a position by simulating a random game from it (a rollout). Moreover, it puts a uniform prior on available actions. Therefore, it can be used to implement the "vanilla" MCTS algorithm.

source

Environment

AlphaZero.MCTS.Env — Type

MCTS.Env{Game}(oracle; <keyword args>) where Game

Create and initialize an MCTS environment with a given oracle.

Keyword Arguments

nworkers=1: numbers of asynchronous workers (see below)
fill_batches=false: if true, a constant batch size is enforced for evaluation requests, by completing batches with dummy entries if necessary
gamma=1.: the reward discount factor
cpuct=1.: exploration constant in the UCT formula
noise_ϵ=0., noise_α=1.: parameters for the dirichlet exploration noise (see below)
prior_temperature=1.: temperature to apply to the oracle's output to get the prior probability vector used by MCTS.

Asynchronous MCTS

If nworkers == 1, MCTS is run in a synchronous fashion and the oracle is invoked through MCTS.evaluate.
If nworkers > 1, nworkers asynchronous workers are spawned, along with an additional task to serve state evaluation requests. Such requests are processed by batches of size nworkers using MCTS.evaluate_batch.

Dirichlet Noise

A naive way to ensure exploration during training is to adopt an ϵ-greedy policy, playing a random move at every turn instead of using the policy prescribed by MCTS.policy with probability ϵ. The problem with this naive strategy is that it may lead the player to make terrible moves at critical moments, thereby biasing the policy evaluation mechanism.

A superior alternative is to add a random bias to the neural prior for the root node during MCTS exploration: instead of considering the policy $p$ output by the neural network in the UCT formula, one uses $(1-ϵ)p + ϵη$ where $η$ is drawn once per call to MCTS.explore! from a Dirichlet distribution of parameter $α$.

source

AlphaZero.MCTS.explore! — Function

MCTS.explore!(env, game, nsims)

Run nsims MCTS simulations from the current state.

source

AlphaZero.MCTS.policy — Function

MCTS.policy(env, game)

Return the recommended stochastic policy on the current state.

A call to this function must always be preceded by a call to MCTS.explore!.

source

AlphaZero.MCTS.reset! — Function

MCTS.reset!(env)

Empty the MCTS tree.

source

Profiling Utilities

AlphaZero.MCTS.inference_time_ratio — Function

MCTS.inference_time_ratio(env)

Return the ratio of time spent by MCTS.explore! on position evaluation (through functions MCTS.evaluate or MCTS.evaluate_batch) since the environment's creation.

source

AlphaZero.MCTS.memory_footprint_per_node — Function

MCTS.memory_footprint_per_node(env)

Return an estimate of the memory footprint of a single node of the MCTS tree (in bytes).

source

AlphaZero.MCTS.approximate_memory_footprint — Function

MCTS.approximate_memory_footprint(env)

Return an estimate of the memory footprint of the MCTS tree (in bytes).

source

AlphaZero.MCTS.average_exploration_depth — Function

MCTS.average_exploration_depth(env)

Return the average number of nodes that are traversed during an MCTS simulation, not counting the root.

source