A detailed plan based on experience

Photo by Марьян Блан | @marjanblan on Unsplash

This is a chapter from my upcoming book, Meta-Learning: Powerful Mental Models for Deep Learning and Thriving in the Digital Age. It is currently available for pre-order and you can pick it up at 50% off.

Before you can learn to do deep learning, you must become a developer. Being a developer is not only about programming. In the larger scheme of things, being able to write code is just a small part of what a developer can do.

If I were to start all over, I would optimize this part of the journey for fun. I would want to…

Photo by Brett Jordan on Unsplash

We write machine learning code in a very specific context. But from what I have seen so far nothing has convinced me that machine learning code is fundamentally different from any other type of code.

This means that standard development practices apply with testing being their very important component.

The rewards of testing can be immense, but so can be the price that one would need to pay for testing poorly or not at all.

Let’s take a closer look at what testing looks like in the context of various machine learning applications.

Scenario 1 — writing single purpose code

This is the bread and butter of…

Evaluation of cosine annealing

Imagine you live in the mountains. One of your kin has fallen sick and you volunteer to get medicine.

You stop by your house to grab the necessities — a map of the city along with a marble-shaped rock you claim brings you good fortune. You hop onto your dragon and fly north.

All that matters initially is a general sense of direction. The details on the ground are barely visible and you cover distance quickly.

As is widely known, dragons need a lot of space to land. …

Data science is a conspiracy.

“Hi, my name is Bob and I’ll be your instructor. I’ll teach you how to drive a car. Open your books on page 147 and let’s learn about different types of exhaust manifolds. Here is the formula for the essential Boyle’s Law…”

This would never happen, right?

And yet in teaching data science elaborating on complex topics is common place whereas no love is given to the fundamentals. We are not told when to accelerate and when to slow down. …

The missing manual to Twitter I wished I had

Photo by Tyler B on Unsplash

There are many good reasons why you might want to start using Twitter. Rachel Thomas lists some of the more important ones in “Making Peace with Personal Branding”.

For me, the appeal is simple. I get to listen and occasionally talk to people doing amazing things in the field I care about (Machine Learning with a focus on Deep Learning). I also sometimes fantasize that what I write is helpful to others. Mostly that is just me dreaming things up.

But Twitter is not without quirks and some things are not what they appear.

Below I present the missing manual…

I have just come out of a project where 80% into it I felt I had very little. I invested a lot of time and in the end it was a total fiasco.

The math that I know or do not know, my ability to write code — all of this has been secondary. The way I approached the project was what was broken.

I now believe that there is an art, or craftsmanship, to structuring machine learning work and none of the math heavy books I tended to binge on seem to mention this.

I did a bit of…

I started using PyTorch a couple of days ago. Below I outline key PyTorch concepts along with a couple of observations that I found particularly useful as I was getting my feet wet with the framework (and which can lead to a lot of frustration if you are not aware of them!)


Tensor — (like) a numpy.ndarray but can live on the GPU.

Variable — allows a tensor to be part of a computation by wrapping itself around it. If created with requires_grad = True, will have gradients calculated during the backwards phase.

How pytorch works

You perform calculations by writing them out…

A depiction of a complex error surface. Image from the Snapshot Ensembles paper.

An experiment inspired by the first lecture of the fast.ai MOOC

In the first lecture of the outstanding Deep Learning Course (linking to version 1, which is also superb, v2 to become available early 2018), we learned how to train a state of the art model using very recent techniques (for instance, the optimal learning rate estimation as described in the Cyclical Learning Rates for Training Neural Networks paper from 2015).

While explaining stochastic gradient descent with restarts, Jeremy Howard made a very interesting point — upon convergence, we would like to find ourselves in a part of the weight space that is resilient, meaning where small changes to the weights…

In this article we will take a look at two ideas that can help you make the most of your training data.

In order to get a better feel for the techniques we will apply them to beating the state of the art from 2013 on distinguishing cats and dogs in images. The plot twist is that we will only use 5% of the original training data.

We will compete against an accuracy of 82.37% achieved using 13 000 training images. Our train set will consist of 650 images selected at random.

Models will be constructed from scratch and we…

In 2013 Kaggle ran the very popular dogs vs cats competition. The objective was to train an algorithm to be able to detect whether an image contains a cat or a dog.

At that time, as stated on the competition website, the state of the art algorithm was able to tell a cat from a dog with an accuracy of 82.7% after having been trained on 13 000 cat and dog images.

My results

I applied transfer learning which is a technique where you take a model trained to carry out some other though similar task and you retrain it to do…

Radek Osmulski

I ❤️ ML / DL ideas — I tweet about them / write about them / implement them. Self-taught RoR developer by trade.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store