On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minimaarrow-up-right
Last updated 6 years ago