Memory Buffer
AlphaZero.TrainingSample
— TypeTrainingSample{Board}
Type of a training sample. A sample features the following fields:
b::Board
is the board position (by convention, white is to play)π::Vector{Float64}
is the recorded MCTS policy for this positionz::Float64
is the reward collected at the end of the gamet::Float64
is the number of moves remaining before the end of the gamen::Int
is the number of times the board positionb
was recorded
As revealed by the last field n
, several samples that correspond to the same board position can be merged, in which case the π
, z
and t
fields are averaged together.
AlphaZero.MemoryBuffer
— TypeMemoryBuffer{Board}
A circular buffer to hold memory samples.
How to use
- Use
new_batch!(mem)
to start a new batch, typically once per iteration before self-play. - Use
push_sample!(mem, board, policy, white_playing, turn)
to record a sample during a game, whereturn
is the number of actions that have been played by both players since the start of the game. - Use
push_game!(mem, white_reward, game_length)
when a game terminates for which samples have been collected.