Training Reports
AlphaZero.Report — ModuleAnalytical reports generated during training, for debugging and hyperparameters tuning.
AlphaZero.Report.Initial — TypeReport.InitialReport summarizing the configuration of an agent before training starts.
num_network_parameters: seeNetwork.num_parametersnum_network_regularized_parameters: seeNetwork.num_regularized_parametersmcts_footprint_per_node: seeMCTS.memory_footprint_per_node
AlphaZero.Report.Iteration — TypeReport.IterationReport generated after each training iteration.
- Fields
self_play,memory,learninghave typesReport.SelfPlay,Report.SelfPlayandReport.Learningrespectively - Fields
perfs_self_play,perfs_memory_analysisandperfs_learningare performance reports for the different phases of the iteration, with typeReport.Perfs
AlphaZero.Report.Perfs — TypeReport.PerfsPerformances report for a subroutine.
time: total time spent, in secondsallocated: amount of memory allocated, in bytesgc_time: total amount of time spent in the garbage collector
Self-Play Phase
AlphaZero.Report.SelfPlay — TypeReport.SelfPlayReport generated after the self-play phase of an iteration.
inference_time_ratio: seeMCTS.inference_time_ratiosamples_gen_speed: average number of samples generated per secondaverage_exploration_depth: seeMCTS.average_exploration_depthmcts_memory_footprint: estimation of the maximal memory footprint of the MCTS tree during self-play, as computed byMCTS.approximate_memory_footprintmemory_size: number of samples in the memory buffer at the end of the self-play phasememory_num_distinct_boards: number of distinct board positions in the memory buffer at the end of the self-play phase
Memory Analysis Phase
AlphaZero.Report.Memory — TypeReport.MemoryReport generated by the memory analysis phase of an iteration. It features statistics for
- the whole memory buffer (
all_samples::Report.Samples) - the samples collected during the last self-play iteration (
latest_batch::Report.Samples) - the subsets of the memory buffer corresponding to different game stages: (
per_game_stage::Vector{Report.StageSamples})
See MemAnalysisParams.
AlphaZero.Report.Samples — TypeReport.SamplesStatistics about a set of samples, as collected during memory analysis.
num_samples: total number of samplesnum_boards: number of distinct board positionsWtot: total weight of the samplesstatus:Report.LearningStatusstatistics of the current network on the samples
AlphaZero.Report.StageSamples — TypeReport.StageSamplesStatistics for the samples corresponding to a particular game stage, as collected during memory analysis.
The samples whose statistics are collected in the samples_stats field correspond to historical positions where the number of remaining moves until the end of the game was in the range defined by the min_remaining_length and max_remaining_length fields.
Learning Phase
AlphaZero.Report.Learning — TypeReport.LearningReport generated at the end of the learning phase of an iteration.
time_convert,time_loss,time_trainandtime_evalare the amounts of time (in seconds) spent at converting the samples, computing losses, performing gradient updates and evaluating checkpoints respectivelyinitial_status: status before the learning phase, as an object of typeReport.LearningStatuslosses: loss value on each minibatchcheckpoints: vector ofReport.Checkpointreportsnn_replaced: true if the best neural network was replaced
AlphaZero.Report.Checkpoint — TypeReport.CheckpointReport generated after a checkpoint evaluation.
batch_id: number of batches after which the checkpoint was computedstatus_after: learning status at the checkpoint, as an object of typeReport.LearningStatusreward: average reward collected by the contender networkredundancy: ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.nn_replaced: true if the current best neural network was updated after the checkpoint
AlphaZero.Report.LearningStatus — TypeReport.LearningStatusStatistics about the performance of the neural network on a subset of the memory buffer.
loss: detailed loss on the samples, as an object of typeReport.LossHp: average entropy of the $π$ component of samples (MCTS policy); this quantity is independent of the network and therefore constant during a learning iterationHpnet: average entropy of the network's prescribed policy on the samples
AlphaZero.Report.Loss — TypeReport.LossDecomposition of the loss in a sum of terms (all have type Float32).
Lis the total loss:L == Lp + Lv + Lreg + LinvLpis the policy cross-entropy loss termLvis the average value mean square errorLregis the L2 regularization loss termLinvis the loss term penalizing the average weight put by the network on invalid actions