Training Reports
AlphaZero.Report
— ModuleAnalytical reports generated during training, for debugging and hyperparameters tuning.
AlphaZero.Report.Initial
— TypeReport.Initial
Report summarizing the configuration of an agent before training starts.
num_network_parameters
: seeNetwork.num_parameters
num_network_regularized_parameters
: seeNetwork.num_regularized_parameters
mcts_footprint_per_node
: seeMCTS.memory_footprint_per_node
AlphaZero.Report.Iteration
— TypeReport.Iteration
Report generated after each training iteration.
- Fields
self_play
,memory
,learning
have typesReport.SelfPlay
,Report.SelfPlay
andReport.Learning
respectively - Fields
perfs_self_play
,perfs_memory_analysis
andperfs_learning
are performance reports for the different phases of the iteration, with typeReport.Perfs
AlphaZero.Report.Perfs
— TypeReport.Perfs
Performances report for a subroutine.
time
: total time spent, in secondsallocated
: amount of memory allocated, in bytesgc_time
: total amount of time spent in the garbage collector
Self-Play Phase
AlphaZero.Report.SelfPlay
— TypeReport.SelfPlay
Report generated after the self-play phase of an iteration.
samples_gen_speed
: average number of samples generated per secondaverage_exploration_depth
: seeMCTS.average_exploration_depth
mcts_memory_footprint
: estimation of the maximal memory footprint of the MCTS tree during self-play, as computed byMCTS.approximate_memory_footprint
memory_size
: number of samples in the memory buffer at the end of the self-play phasememory_num_distinct_boards
: number of distinct board positions in the memory buffer at the end of the self-play phase
Memory Analysis Phase
AlphaZero.Report.Memory
— TypeReport.Memory
Report generated by the memory analysis phase of an iteration. It features statistics for
- the whole memory buffer (
all_samples::Report.Samples
) - the samples collected during the last self-play iteration (
latest_batch::Report.Samples
) - the subsets of the memory buffer corresponding to different game stages: (
per_game_stage::Vector{Report.StageSamples}
)
See MemAnalysisParams
.
AlphaZero.Report.Samples
— TypeReport.Samples
Statistics about a set of samples, as collected during memory analysis.
num_samples
: total number of samplesnum_boards
: number of distinct board positionsWtot
: total weight of the samplesstatus
:Report.LearningStatus
statistics of the current network on the samples
AlphaZero.Report.StageSamples
— TypeReport.StageSamples
Statistics for the samples corresponding to a particular game stage, as collected during memory analysis.
The samples whose statistics are collected in the samples_stats
field correspond to historical positions where the number of remaining moves until the end of the game was in the range defined by the min_remaining_length
and max_remaining_length
fields.
Learning Phase
AlphaZero.Report.Learning
— TypeReport.Learning
Report generated at the end of the learning phase of an iteration.
time_convert
,time_loss
,time_train
andtime_eval
are the amounts of time (in seconds) spent at converting the samples, computing losses, performing gradient updates and evaluating checkpoints respectivelyinitial_status
: status before the learning phase, as an object of typeReport.LearningStatus
losses
: loss value on each minibatchcheckpoints
: vector ofReport.Checkpoint
reportsnn_replaced
: true if the best neural network was replaced
AlphaZero.Report.Checkpoint
— TypeReport.Checkpoint
Report generated after a checkpoint evaluation.
batch_id
: number of batches after which the checkpoint was computedevaluation
: evaluation report from the arena, of typeReport.Evaluation
status_after
: learning status at the checkpoint, as an object of typeReport.LearningStatus
nn_replaced
: true if the current best neural network was updated after the checkpoint
AlphaZero.Report.LearningStatus
— TypeReport.LearningStatus
Statistics about the performance of the neural network on a subset of the memory buffer.
loss
: detailed loss on the samples, as an object of typeReport.Loss
Hp
: average entropy of the $π$ component of samples (MCTS policy); this quantity is independent of the network and therefore constant during a learning iterationHpnet
: average entropy of the network's prescribed policy on the samples
AlphaZero.Report.Loss
— TypeReport.Loss
Decomposition of the loss in a sum of terms (all have type Float32
).
L
is the total loss:L == Lp + Lv + Lreg + Linv
Lp
is the policy cross-entropy loss termLv
is the average value mean square errorLreg
is the L2 regularization loss termLinv
is the loss term penalizing the average weight put by the network on invalid actions
Evaluatons and benchmarks
AlphaZero.Report.Evaluation
— TypeReport.Evaluation
The outcome of evaluating a player against a baseline player.
Two-player Games
rewards
is the sequence of rewards collected by the evaluated playeravgr
is the average reward collected by the evaluated playerbaseline_rewards
isnothing
Single-player Games
rewards
is the sequence of rewards collected by the evaluated playerbaseline_rewards
is the sequence of rewards collected by the baseline playeravgr
is equal tomean(rewards) - mean(baseline_rewards)
Common Fields
legend
is a string describing the evaluationredundancy
is the ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.time
is the computing time spent running the evaluation, in seconds
AlphaZero.Report.Benchmark
— Typeconst Report.Benchmark = Vector{Report.Evaluation}
A benchmark report is a vector of Evaluation
objects.