Analytical reports generated during training, for debugging and hyperparameters tuning.
Report summarizing the configuration of an agent before training starts.
Report generated after each training iteration.
Performances report for a subroutine.
time: total time spent, in seconds
allocated: amount of memory allocated, in bytes
gc_time: total amount of time spent in the garbage collector
Report generated after the self-play phase of an iteration.
samples_gen_speed: average number of samples generated per second
mcts_memory_footprint: estimation of the maximal memory footprint of the MCTS tree during self-play, as computed by
memory_size: number of samples in the memory buffer at the end of the self-play phase
memory_num_distinct_boards: number of distinct board positions in the memory buffer at the end of the self-play phase
Report generated by the memory analysis phase of an iteration. It features statistics for
- the whole memory buffer (
- the samples collected during the last self-play iteration (
- the subsets of the memory buffer corresponding to different game stages: (
Statistics about a set of samples, as collected during memory analysis.
num_samples: total number of samples
num_boards: number of distinct board positions
Wtot: total weight of the samples
Report.LearningStatusstatistics of the current network on the samples
Statistics for the samples corresponding to a particular game stage, as collected during memory analysis.
The samples whose statistics are collected in the
samples_stats field correspond to historical positions where the number of remaining moves until the end of the game was in the range defined by the
Report generated at the end of the learning phase of an iteration.
time_evalare the amounts of time (in seconds) spent at converting the samples, computing losses, performing gradient updates and evaluating checkpoints respectively
initial_status: status before the learning phase, as an object of type
losses: loss value on each minibatch
checkpoints: vector of
nn_replaced: true if the best neural network was replaced
Report generated after a checkpoint evaluation.
batch_id: number of batches after which the checkpoint was computed
status_after: learning status at the checkpoint, as an object of type
reward: average reward collected by the contender network
redundancy: ratio of duplicate positions encountered during the evaluation, not counting the initial position. If this number is too high, you may want to increase the move selection temperature.
nn_replaced: true if the current best neural network was updated after the checkpoint
Statistics about the performance of the neural network on a subset of the memory buffer.
loss: detailed loss on the samples, as an object of type
Hp: average entropy of the $π$ component of samples (MCTS policy); this quantity is independent of the network and therefore constant during a learning iteration
Hpnet: average entropy of the network's prescribed policy on the samples
Decomposition of the loss in a sum of terms (all have type
Lis the total loss:
L == Lp + Lv + Lreg + Linv
Lpis the policy cross-entropy loss term
Lvis the average value mean square error
Lregis the L2 regularization loss term
Linvis the loss term penalizing the average weight put by the network on invalid actions