Memory Buffer
AlphaZero.TrainingSample — TypeTrainingSample{State}Type of a training sample. A sample features the following fields:
s::Stateis the stateπ::Vector{Float64}is the recorded MCTS policy for this positionz::Float64is the discounted reward cumulated from statest::Float64is the (average) number of moves remaining before the end of the gamen::Intis the number of times the stateswas recorded
As revealed by the last field n, several samples that correspond to the same state can be merged, in which case the π, z and t fields are averaged together.
AlphaZero.MemoryBuffer — TypeMemoryBuffer(game_spec, size, experience=[])A circular buffer to hold memory samples.
AlphaZero.get_experience — Methodget_experience(::MemoryBuffer) :: Vector{<:TrainingSample}Return all samples in the memory buffer.
AlphaZero.push_trace! — Functionpush_trace!(mem::MemoryBuffer, trace::Trace, gamma)Collect samples out of a game trace and add them to the memory buffer.
Here, gamma is the reward discount factor.