A depiction of a complex error surface. Image from the Snapshot Ensembles paper.

Do smoother areas of the error surface lead to better generalization?

An experiment inspired by the first lecture of the fast.ai MOOC

The setup

I trained 100 simple neural networks (each consisting of a single hidden layer with 20 units) on the MNIST dataset achieving on average 95.4% accuracy on the test set. I then measured their ability to generalize by looking at the difference between the loss on the train set and the test set.


Can we safely conclude that the relationship we hypothesized does not hold? By all means, we cannot do this either!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Radek Osmulski

I ❤️ ML / DL ideas — I tweet about them / write about them / implement them. Recommender Systems at NVIDIA