Memory Buffer
AlphaZero.TrainingSample
— TypeTrainingSample{State}
Type of a training sample. A sample features the following fields:
s::State
is the stateπ::Vector{Float64}
is the recorded MCTS policy for this positionz::Float64
is the discounted reward cumulated from states
t::Float64
is the (average) number of moves remaining before the end of the gamen::Int
is the number of times the states
was recorded
As revealed by the last field n
, several samples that correspond to the same state can be merged, in which case the π
, z
and t
fields are averaged together.
AlphaZero.MemoryBuffer
— TypeMemoryBuffer(game_spec, size, experience=[])
A circular buffer to hold memory samples.
AlphaZero.get_experience
— Methodget_experience(::MemoryBuffer) :: Vector{<:TrainingSample}
Return all samples in the memory buffer.
AlphaZero.push_trace!
— Functionpush_trace!(mem::MemoryBuffer, trace::Trace, gamma)
Collect samples out of a game trace and add them to the memory buffer.
Here, gamma
is the reward discount factor.