Training Reports

AlphaZero.Report — Module

Analytical reports generated during training, for debugging and hyperparameters tuning.

AlphaZero.Report.Initial — Type

Report.Initial

Report summarizing the configuration of an agent before training starts.

num_network_parameters: see Network.num_parameters
num_network_regularized_parameters: see Network.num_regularized_parameters
mcts_footprint_per_node: see MCTS.memory_footprint_per_node

source

AlphaZero.Report.Iteration — Type

Report.Iteration

Report generated after each training iteration.

Fields self_play, memory, learning have types Report.SelfPlay, Report.SelfPlay and Report.Learning respectively
Fields perfs_self_play, perfs_memory_analysis and perfs_learning are performance reports for the different phases of the iteration, with type Report.Perfs

source

AlphaZero.Report.Perfs — Type

Report.Perfs

Performances report for a subroutine.

time: total time spent, in seconds
allocated: amount of memory allocated, in bytes
gc_time: total amount of time spent in the garbage collector

source

Self-Play Phase

AlphaZero.Report.SelfPlay — Type

Report.SelfPlay

Report generated after the self-play phase of an iteration.

samples_gen_speed: average number of samples generated per second
average_exploration_depth: see MCTS.average_exploration_depth
mcts_memory_footprint: estimation of the maximal memory footprint of the MCTS tree during self-play, as computed by MCTS.approximate_memory_footprint
memory_size: number of samples in the memory buffer at the end of the self-play phase
memory_num_distinct_boards: number of distinct board positions in the memory buffer at the end of the self-play phase

source

Memory Analysis Phase

AlphaZero.Report.Memory — Type

Report.Memory

Report generated by the memory analysis phase of an iteration. It features statistics for

the whole memory buffer (all_samples::Report.Samples)
the samples collected during the last self-play iteration (latest_batch::Report.Samples)
the subsets of the memory buffer corresponding to different game stages: (per_game_stage::Vector{Report.StageSamples})

See MemAnalysisParams.

source

AlphaZero.Report.Samples — Type

Report.Samples

Statistics about a set of samples, as collected during memory analysis.

num_samples: total number of samples
num_boards: number of distinct board positions
Wtot: total weight of the samples
status: Report.LearningStatus statistics of the current network on the samples

source

AlphaZero.Report.StageSamples — Type

Report.StageSamples

Statistics for the samples corresponding to a particular game stage, as collected during memory analysis.

The samples whose statistics are collected in the samples_stats field correspond to historical positions where the number of remaining moves until the end of the game was in the range defined by the min_remaining_length and max_remaining_length fields.

source

Learning Phase

AlphaZero.Report.Learning — Type

Report.Learning

Report generated at the end of the learning phase of an iteration.

time_convert, time_loss, time_train and time_eval are the amounts of time (in seconds) spent at converting the samples, computing losses, performing gradient updates and evaluating checkpoints respectively
initial_status: status before the learning phase, as an object of type Report.LearningStatus
losses: loss value on each minibatch
checkpoints: vector of Report.Checkpoint reports
nn_replaced: true if the best neural network was replaced

source

AlphaZero.Report.Checkpoint — Type

Report.Checkpoint

Report generated after a checkpoint evaluation.

batch_id: number of batches after which the checkpoint was computed
evaluation: evaluation report from the arena, of type Report.Evaluation
status_after: learning status at the checkpoint, as an object of type Report.LearningStatus
nn_replaced: true if the current best neural network was updated after the checkpoint

source

AlphaZero.Report.LearningStatus — Type

Report.LearningStatus

Statistics about the performance of the neural network on a subset of the memory buffer.

loss: detailed loss on the samples, as an object of type Report.Loss
Hp: average entropy of the $π$ component of samples (MCTS policy); this quantity is independent of the network and therefore constant during a learning iteration
Hpnet: average entropy of the network's prescribed policy on the samples

source

AlphaZero.Report.Loss — Type

Report.Loss

Decomposition of the loss in a sum of terms (all have type Float32).

L is the total loss: L == Lp + Lv + Lreg + Linv
Lp is the policy cross-entropy loss term
Lv is the average value mean square error
Lreg is the L2 regularization loss term
Linv is the loss term penalizing the average weight put by the network on invalid actions

source

Evaluatons and benchmarks

AlphaZero.Report.Evaluation — Type

Report.Evaluation

The outcome of evaluating a player against a baseline player.

Two-player Games

rewards is the sequence of rewards collected by the evaluated player
avgr is the average reward collected by the evaluated player
baseline_rewards is nothing

Single-player Games

rewards is the sequence of rewards collected by the evaluated player
baseline_rewards is the sequence of rewards collected by the baseline player
avgr is equal to mean(rewards) - mean(baseline_rewards)

Common Fields

legend is a string describing the evaluation
redundancy is the ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.
time is the computing time spent running the evaluation, in seconds

source

AlphaZero.Report.Benchmark — Type

const Report.Benchmark = Vector{Report.Evaluation}

A benchmark report is a vector of Evaluation objects.

source