# MCTS

`AlphaZero.MCTS`

— ModuleA generic, standalone implementation of Monte Carlo Tree Search. It can be used on any game that implements `GameInterface`

and with any external oracle.

**Oracle Interface**

An oracle can be any function or callable object.

oracle(state)

evaluates a single state from the current player's perspective and returns a pair `(P, V)`

where:

`P`

is a probability vector on`GI.available_actions(GI.init(gspec, state))`

`V`

is a scalar estimating the value or win probability for white.

## Standard Oracles

`AlphaZero.MCTS.RolloutOracle`

— Type`MCTS.RolloutOracle(game_spec::AbstractGameSpec, γ=1.) <: Function`

This oracle estimates the value of a position by simulating a random game from it (a rollout). Moreover, it puts a uniform prior on available actions. Therefore, it can be used to implement the "vanilla" MCTS algorithm.

## Environment

`AlphaZero.MCTS.Env`

— Type`MCTS.Env(game_spec::AbstractGameSpec, oracle; <keyword args>)`

Create and initialize an MCTS environment with a given `oracle`

.

**Keyword Arguments**

`gamma=1.`

: the reward discount factor`cpuct=1.`

: exploration constant in the UCT formula`noise_ϵ=0., noise_α=1.`

: parameters for the dirichlet exploration noise (see below)`prior_temperature=1.`

: temperature to apply to the oracle's output to get the prior probability vector used by MCTS.

**Dirichlet Noise**

A naive way to ensure exploration during training is to adopt an ϵ-greedy policy, playing a random move at every turn instead of using the policy prescribed by `MCTS.policy`

with probability ϵ. The problem with this naive strategy is that it may lead the player to make terrible moves at critical moments, thereby biasing the policy evaluation mechanism.

A superior alternative is to add a random bias to the neural prior for the root node during MCTS exploration: instead of considering the policy $p$ output by the neural network in the UCT formula, one uses $(1-ϵ)p + ϵη$ where $η$ is drawn once per call to `MCTS.explore!`

from a Dirichlet distribution of parameter $α$.

`AlphaZero.MCTS.explore!`

— Function`MCTS.explore!(env, game, nsims)`

Run `nsims`

MCTS simulations from the current state.

`AlphaZero.MCTS.policy`

— Function`MCTS.policy(env, game)`

Return the recommended stochastic policy on the current state.

A call to this function must always be preceded by a call to `MCTS.explore!`

.

`AlphaZero.MCTS.reset!`

— Function`MCTS.reset!(env)`

Empty the MCTS tree.

## Profiling Utilities

`AlphaZero.MCTS.memory_footprint_per_node`

— Function`MCTS.memory_footprint_per_node(gspec)`

Return an estimate of the memory footprint of a single MCTS node for the given game (in bytes).

`AlphaZero.MCTS.approximate_memory_footprint`

— Function`MCTS.approximate_memory_footprint(env)`

Return an estimate of the memory footprint of the MCTS tree (in bytes).

`AlphaZero.MCTS.average_exploration_depth`

— Function`MCTS.average_exploration_depth(env)`

Return the average number of nodes that are traversed during an MCTS simulation, not counting the root.