Type for an AlphZero environment.

The environment features the current neural network, the best neural network seen so far that is used for data generation, a memory buffer and an iteration counter.


Env(game_spec, params, curnn, bestnn=copy(curnn), experience=[], itc=0)

Construct a new AlphaZero environment:

  • game_spec specified the game being played
  • params has type Params
  • curnn is the current neural network and has type AbstractNetwork
  • bestnn is the best neural network so far, which is used for data generation
  • experience is the initial content of the memory buffer as a vector of TrainingSample
  • itc is the value of the iteration counter (0 at the start of training)

Namespace for the callback functions that are used during training. This enables logging, saving and plotting to be implemented separately. An example handler object is Session.

All callback functions take a handler object h as their first argument and sometimes a second argment r that consists in a report.

iteration_started(h)called at the beggining of an iteration
self_play_started(h)called once per iter before self play starts
game_played(h)called after each game of self play
self_play_finished(h, r)sends report: Report.SelfPlay
memory_analyzed(h, r)sends report: Report.Memory
learning_started(h)called at the beginning of the learning phase
updates_started(h, r)sends report: Report.LearningStatus
updates_finished(h, r)sends report: Report.LearningStatus
checkpoint_started(h)called before a checkpoint evaluation starts
checkpoint_game_played(h)called after each arena game
checkpoint_finished(h, r)sends report: Report.Checkpoint
learning_finished(h, r)sends report: Report.Learning
iteration_finished(h, r)sends report: Report.Iteration
training_finished(h)called once at the end of training
train!(env::Env, handler=nothing)

Start or resume the training of an AlphaZero agent.

A handler object can be passed that implements a subset of the callback functions defined in Handlers.