On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
Last updated 5 years ago