Training Reports

AlphaZero.ReportModule

Analytical reports generated during training, for debugging and hyperparameters tuning.

source
AlphaZero.Report.PerfsType
Report.Perfs

Performances report for a subroutine.

  • time: total time spent, in seconds
  • allocated: amount of memory allocated, in bytes
  • gc_time: total amount of time spent in the garbage collector
source

Self-Play Phase

AlphaZero.Report.SelfPlayType
Report.SelfPlay

Report generated after the self-play phase of an iteration.

  • inference_time_ratio: see MCTS.inference_time_ratio
  • samples_gen_speed: average number of samples generated per second
  • average_exploration_depth: see MCTS.average_exploration_depth
  • mcts_memory_footprint: estimation of the maximal memory footprint of the MCTS tree during self-play, as computed by MCTS.approximate_memory_footprint
  • memory_size: number of samples in the memory buffer at the end of the self-play phase
  • memory_num_distinct_boards: number of distinct board positions in the memory buffer at the end of the self-play phase
source

Memory Analysis Phase

AlphaZero.Report.MemoryType
Report.Memory

Report generated by the memory analysis phase of an iteration. It features statistics for

  • the whole memory buffer (all_samples::Report.Samples)
  • the samples collected during the last self-play iteration (latest_batch::Report.Samples)
  • the subsets of the memory buffer corresponding to different game stages: (per_game_stage::Vector{Report.StageSamples})

See MemAnalysisParams.

source
AlphaZero.Report.SamplesType
Report.Samples

Statistics about a set of samples, as collected during memory analysis.

  • num_samples: total number of samples
  • num_boards: number of distinct board positions
  • Wtot: total weight of the samples
  • status: Report.LearningStatus statistics of the current network on the samples
source
AlphaZero.Report.StageSamplesType
Report.StageSamples

Statistics for the samples corresponding to a particular game stage, as collected during memory analysis.

The samples whose statistics are collected in the samples_stats field correspond to historical positions where the number of remaining moves until the end of the game was in the range defined by the min_remaining_length and max_remaining_length fields.

source

Learning Phase

AlphaZero.Report.LearningType
Report.Learning

Report generated at the end of the learning phase of an iteration.

  • time_convert, time_loss, time_train and time_eval are the amounts of time (in seconds) spent at converting the samples, computing losses, performing gradient updates and evaluating checkpoints respectively
  • initial_status: status before the learning phase, as an object of type Report.LearningStatus
  • losses: loss value on each minibatch
  • checkpoints: vector of Report.Checkpoint reports
  • nn_replaced: true if the best neural network was replaced
source
AlphaZero.Report.CheckpointType
Report.Checkpoint

Report generated after a checkpoint evaluation.

  • batch_id: number of batches after which the checkpoint was computed
  • status_after: learning status at the checkpoint, as an object of type Report.LearningStatus
  • reward: average reward collected by the contender network
  • redundancy: ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.
  • nn_replaced: true if the current best neural network was updated after the checkpoint
source
AlphaZero.Report.LearningStatusType
Report.LearningStatus

Statistics about the performance of the neural network on a subset of the memory buffer.

  • loss: detailed loss on the samples, as an object of type Report.Loss
  • Hp: average entropy of the $π$ component of samples (MCTS policy); this quantity is independent of the network and therefore constant during a learning iteration
  • Hpnet: average entropy of the network's prescribed policy on the samples
source
AlphaZero.Report.LossType
Report.Loss

Decomposition of the loss in a sum of terms (all have type Float32).

  • L is the total loss: L == Lp + Lv + Lreg + Linv
  • Lp is the policy cross-entropy loss term
  • Lv is the average value mean square error
  • Lreg is the L2 regularization loss term
  • Linv is the loss term penalizing the average weight put by the network on invalid actions
source