Memory Buffer
AlphaZero.TrainingSample — TypeTrainingSample{Board}Type of a training sample. A sample features the following fields:
b::Boardis the board position (by convention, white is to play)π::Vector{Float64}is the recorded MCTS policy for this positionz::Float64is the reward collected at the end of the gamet::Float64is the number of moves remaining before the end of the gamen::Intis the number of times the board positionbwas recorded
As revealed by the last field n, several samples that correspond to the same board position can be merged, in which case the π, z and t fields are averaged together.
AlphaZero.MemoryBuffer — TypeMemoryBuffer{Board}A circular buffer to hold memory samples.
How to use
- Use
new_batch!(mem)to start a new batch, typically once per iteration before self-play. - Use
push_sample!(mem, board, policy, white_playing, turn)to record a sample during a game, whereturnis the number of actions that have been played by both players since the start of the game. - Use
push_game!(mem, white_reward, game_length)when a game terminates for which samples have been collected.