MCTS

AlphaZero.MCTSModule

A generic, standalone implementation of Monte Carlo Tree Search. It can be used on any game that implements GameInterface and with any external oracle.

source

Oracles

AlphaZero.MCTS.OracleType
MCTS.Oracle{Game}

Abstract base type for an oracle. Oracles must be callable:

(::Oracle)(state)

Evaluate a single state from the current player's perspective.

Return a pair (P, V) where:

  • P is a probability vector on GI.available_actions(Game(state))
  • V is a scalar estimating the value or win probability for white.
source
AlphaZero.MCTS.RolloutOracleType
MCTS.RolloutOracle{Game}(γ=1.)

This oracle estimates the value of a position by simulating a random game from it (a rollout). Moreover, it puts a uniform prior on available actions. Therefore, it can be used to implement the "vanilla" MCTS algorithm.

source

Environment

AlphaZero.MCTS.EnvType
MCTS.Env{Game}(oracle; <keyword args>) where Game

Create and initialize an MCTS environment with a given oracle.

Keyword Arguments

  • gamma=1.: the reward discount factor
  • cpuct=1.: exploration constant in the UCT formula
  • noise_ϵ=0., noise_α=1.: parameters for the dirichlet exploration noise (see below)
  • prior_temperature=1.: temperature to apply to the oracle's output to get the prior probability vector used by MCTS.

Dirichlet Noise

A naive way to ensure exploration during training is to adopt an ϵ-greedy policy, playing a random move at every turn instead of using the policy prescribed by MCTS.policy with probability ϵ. The problem with this naive strategy is that it may lead the player to make terrible moves at critical moments, thereby biasing the policy evaluation mechanism.

A superior alternative is to add a random bias to the neural prior for the root node during MCTS exploration: instead of considering the policy $p$ output by the neural network in the UCT formula, one uses $(1-ϵ)p + ϵη$ where $η$ is drawn once per call to MCTS.explore! from a Dirichlet distribution of parameter $α$.

source

Profiling Utilities