# MCTS

`AlphaZero.MCTS`

— ModuleA generic, standalone implementation of Monte Carlo Tree Search. It can be used on any game that implements `GameInterface`

and with any external oracle.

## Oracles

`AlphaZero.MCTS.Oracle`

— Type`MCTS.Oracle{Game}`

Abstract base type for an oracle. Oracles must be callable:

(::Oracle)(state)

Evaluate a single state from the current player's perspective.

Return a pair `(P, V)`

where:

`P`

is a probability vector on`GI.available_actions(Game(state))`

`V`

is a scalar estimating the value or win probability for white.

`AlphaZero.MCTS.RolloutOracle`

— Type`MCTS.RolloutOracle{Game}(γ=1.)`

This oracle estimates the value of a position by simulating a random game from it (a rollout). Moreover, it puts a uniform prior on available actions. Therefore, it can be used to implement the "vanilla" MCTS algorithm.

## Environment

`AlphaZero.MCTS.Env`

— Type`MCTS.Env{Game}(oracle; <keyword args>) where Game`

Create and initialize an MCTS environment with a given `oracle`

.

**Keyword Arguments**

`gamma=1.`

: the reward discount factor`cpuct=1.`

: exploration constant in the UCT formula`noise_ϵ=0., noise_α=1.`

: parameters for the dirichlet exploration noise (see below)`prior_temperature=1.`

: temperature to apply to the oracle's output to get the prior probability vector used by MCTS.

**Dirichlet Noise**

A naive way to ensure exploration during training is to adopt an ϵ-greedy policy, playing a random move at every turn instead of using the policy prescribed by `MCTS.policy`

with probability ϵ. The problem with this naive strategy is that it may lead the player to make terrible moves at critical moments, thereby biasing the policy evaluation mechanism.

A superior alternative is to add a random bias to the neural prior for the root node during MCTS exploration: instead of considering the policy $p$ output by the neural network in the UCT formula, one uses $(1-ϵ)p + ϵη$ where $η$ is drawn once per call to `MCTS.explore!`

from a Dirichlet distribution of parameter $α$.

`AlphaZero.MCTS.explore!`

— Function`MCTS.explore!(env, game, nsims)`

Run `nsims`

MCTS simulations from the current state.

`AlphaZero.MCTS.policy`

— Function`MCTS.policy(env, game)`

Return the recommended stochastic policy on the current state.

A call to this function must always be preceded by a call to `MCTS.explore!`

.

`AlphaZero.MCTS.reset!`

— Function`MCTS.reset!(env)`

Empty the MCTS tree.

## Profiling Utilities

`AlphaZero.MCTS.memory_footprint_per_node`

— Function`MCTS.memory_footprint_per_node(env)`

Return an estimate of the memory footprint of a single node of the MCTS tree (in bytes).

`AlphaZero.MCTS.approximate_memory_footprint`

— Function`MCTS.approximate_memory_footprint(env)`

Return an estimate of the memory footprint of the MCTS tree (in bytes).

`AlphaZero.MCTS.average_exploration_depth`

— Function`MCTS.average_exploration_depth(env)`

Return the average number of nodes that are traversed during an MCTS simulation, not counting the root.