ConvergenceControl-v0 (experimental) (by @iaroslav-ai)
Agent can adjust parameters like step size, momentum etc during training of deep convolutional neural net to improve its convergence / quality of end - result. One episode in this environment is a training of one neural net for 20 epochs. Agent can adjust parameters in the beginning of every epoch.
ConvergenceControl-v0 is an unsolved environment, which means it does not have a specified reward threshold at which it's considered solved.
Parameters that agent can adjust are learning rate and momentum coefficients for SGD, batch size, l1 and l2 penalty. As a feedback, agent receives # of instances / labels in dataset, description of network architecture, and validation accuracy for every epoch.
Architecture of neural network and dataset used are selected randomly at the beginning of an episode. Datasets used are MNIST, CIFAR10, CIFAR100. Network architectures contain multilayer convnets 66 % of the time, and are [classic] feedforward nets otherwise.
Number of instances in datasets are chosen at random in range from around 100% to 5% such that adjustment of l1, l2 penalty coefficients makes more difference.
Let the best accuracy achieved so far at every epoch be denoted as a; Then reward at every step is a + a*a. On the one hand side, this encourages fast convergence, as it improves cumulative reward over the episode. On the other hand side, improving best achieved accuracy is expected to quadratically improve cumulative reward, thus encouraging agent to converge fast while achieving high best validation accuracy value.
As the number of labels increases, learning problem becomes more difficult for a fixed dataset size. In order to avoid for the agent to ignore more complex datasets, on which accuracy is low and concentrate on simple cases which bring bulk of reward, accuracy is normalized by the number of labels in a dataset.
|Algorithm||Best 100-episode performance||Submitted|