Memory Buffer

AlphaZero.TrainingSample — Type

TrainingSample{State}

Type of a training sample. A sample features the following fields:

s::State is the state
π::Vector{Float64} is the recorded MCTS policy for this position
z::Float64 is the discounted reward cumulated from state s
t::Float64 is the (average) number of moves remaining before the end of the game
n::Int is the number of times the state s was recorded

As revealed by the last field n, several samples that correspond to the same state can be merged, in which case the π, z and t fields are averaged together.

source

AlphaZero.MemoryBuffer — Type

MemoryBuffer(game_spec, size, experience=[])

A circular buffer to hold memory samples.

source

AlphaZero.get_experience — Method

get_experience(::MemoryBuffer) :: Vector{<:TrainingSample}

Return all samples in the memory buffer.

source

AlphaZero.push_trace! — Function

push_trace!(mem::MemoryBuffer, trace::Trace, gamma)

Collect samples out of a game trace and add them to the memory buffer.

Here, gamma is the reward discount factor.

source