My first neural network classifier, statue or painting?

View On GitHub View On Colab

Recently I decided to learn deep learning and the math behind the training process and I’d like to summarize what I’ve learned so far by writing a simple but efficient classifier, a neural network capable to tell if an image contains a painting or a statue.

The code uses FastAI deep learning library, mainly because of their motto FASTAI: Making neural nets uncool again and I used Colab, a cloud Python editor with GPU support, to write the code.

Table of Contents #

Prepare the dataset #

Last week I went to the Uffizi Museum and I toke some pictures of paintings and statues so I decided to use them as my training data.

I removed the blurred photos and split them into two folders, named paint and statue, the folder name is very important because the training code uses it to label the content.

Install the libraries #

The first lines of code are for installing libraries and importing the necessary modules.

!pip install fastai==2.2.5
!pip install exif

from fastai.data.all import *
from fastai.vision.all import *
from exif import Image as ExifImage

An issue I had is that fastai by default doesn’t rotate the images according to their EXIF metadata so I searched and found a piece of code to rotate the photos in the correct orientation before feeding the neural network with them.

def exif_type_tfms(fn, cls, **kwargs):     
  def get_orientation(fn: (Path, str)):
    with open(fn, 'rb') as image_file:
      exif_img = ExifImage(image_file)
      try:
        return exif_img.orientation.value
      except AttributeError:
        # ignore orientation unset
        return 1
  def f(img, rotate=0, transpose=False):
    img = img.rotate(rotate, expand=True)
    if transpose:
      img = img.transpose(Image.FLIP_LEFT_RIGHT)
    return img

  # Image.rotate will do shorcuts on these magic angles, so no need for any 
  # specific resampling strategy
  trafo_fns = {
    1: partial(f, rotate=0),
    6: partial(f, rotate=270),
    8: partial(f, rotate=90),
    3: partial(f, rotate=180),
    2: partial(f, rotate=0, transpose=True),
    5: partial(f, rotate=270, transpose=True),
    7: partial(f, rotate=90, transpose=True),
    4: partial(f, rotate=180, transpose=True),
  }
  img = cls.create(fn, **kwargs)
  orientation = get_orientation(fn)
  img = trafo_fns[orientation](img)
  return cls(img)

def ExifImageBlock(cls=PILImage):
  """
  if images are rotated with the EXIF orientation flag
  it must be respected when loading the images

  ExifImageBlock can be pickled (which is important to dump learners)
  >>> pickle.dumps(ExifImageBlock())
  b'...
  """
  return TransformBlock(type_tfms=partial(exif_type_tfms, cls=cls), batch_tfms=IntToFloatTensor)

Configure the datasets #

datablock = DataBlock(
    get_items=get_image_files, # how to gather images
    item_tfms=Resize(224), # how to transform images before use them
    blocks=(ExifImageBlock, CategoryBlock), # (input transforms, label transforms)
    get_y=parent_label, # how to label the images
    splitter=RandomSplitter()) # how to get the training dataset

A DataBlock represents a list of instructions used by the network to gather the dataset, label it, transform it, and how to split it into a training dataset and a validation dataset.

The get_image_files function is a utility that grabs all the files in a folder, recursively.

The Resize function resizes the image to 224px x 224px. Why 224px? It’s a standard size, you can use the value you’d like but remember that a larger image requires more resources to be processed.

The blocks attribute is a Tuple that contains the block of the x, our input dataset, and the block of the y, our labels.
A block represents a set of transformations to apply.
The ExifImageBlock will rotate the images according to the EXIF metadata, and the CategoryBlock indicates that images are labeled with only two categories, in my case statue or paint.

The parent_label function receives the image path as input and returns the name of the containing folder, my label.
So if the photo of a statue is inside the statue folder, the neural network will learn that that photo is a statue.

The RandomSplitter() without any parameters will create a validation dataset using 20% of the input dataset.

dls = datablock.dataloaders('/content/drive/MyDrive/uffizi', bs=5)
dls.show_batch()

The bs attribute is the batch size, how many training examples the neural network will process at a time. The larger the batch size the more resources you need, the smaller the batch size the more updates you need.

The show_batch method shows the first batch that the neural network will process.

All the images are labeled with their parent folder name.

Train the architecture into a model #

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(4)

Here we use a convolutional neural network, inspired by biological processes, is the state-of-the-art in image recognition, NPL, and bioinformatics.

resnet34 is a Residual Network, a model pre-trained with the ImageNet dataset, it consists of 34 layers.

The fine_tune(4) call will train the neural network with our dataset 4 times, each time is called epoch.
For each epoch, it tests the goodness of the model using the validation dataset by calculating the loss value.

Looking at the output of the training process we notice an issue, from the 2nd epoch to the 4th epoch the loss is increasing meaning that the model is getting worse, the learning process is not working correctly, and the reason for that is the batch size and how it interferes with the SGD (Stochastic Gradient Descent), that is not completely clear to me why it happens but I’ll investigate more and I’ll write a post about it.

epochtrain_lossvalid_losserror_ratetime
00.7746610.1217910.07142900:38
epochtrain_lossvalid_losserror_ratetime
00.1876330.0380420.00000000:49
10.1095730.0070190.00000000:49
20.3564120.0034730.00000000:49
30.3732530.0040660.00000000:50

Let’s increase the batch size to 10 and re-launch the training.

datablock = DataBlock(
    get_items = get_image_files,
    item_tfms=Resize(224),
    blocks=(ExifImageBlock, CategoryBlock),
    get_y=parent_label,
    splitter=RandomSplitter())
dls = datablock.dataloaders('/content/drive/MyDrive/uffizi', bs=10)
dls.show_batch()
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(4)

With a batch size value of 10 the loss decreases for each epoch as we expected, our network is learning!!!

epochtrain_lossvalid_losserror_ratetime
00.8525320.5582320.21428600:37
epochtrain_lossvalid_losserror_ratetime
00.3596260.0685050.00000000:46
10.2937460.0034160.00000000:46
20.2603960.0045840.00000000:46
30.1929670.0050920.00000000:46

A confusion matrix is helpful to understand where our neural network is wrong, we can print it using the following code.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Nothing wrong so far 🚀

Testing the neural network #

Now we create an image object using a painting that the model has never seen.

Using an image that the network has never seen is very important, you can’t test the network with the images contained in your input dataset, it’s cheating if you do that.

paint = PILImage.create('/content/dechirico1.jpg')
paint = PILImage(paint.resize((430, 224)))
paint
learn.predict(paint)
('paint', tensor(0), tensor([0.9841, 0.0159]))

Hurray!!! The network has confidently classified the image as a painting, it’s 98.41% sure about that. Let’s try with another painting that contains a statue 🙄

paint = PILImage.create('/content/dechirico2.jpg')
paint = PILImage(paint.resize((224, 224)))
paint
learn.predict(paint)
('paint', tensor(0), tensor([0.9972, 0.0028]))

It’s 99% sure about paint category 🎉
And finally lady liberty.

paint = PILImage.create('/content/ladyliberty.jpg')
paint = PILImage(paint.resize((224, 224)))
paint
learn.predict(paint)
('statue', tensor(1), tensor([0.0070, 0.9930]))

It’s 99% sure that it is a statue 🗽

Conclusions #

I am amazed how with so few images it’s possible to create a neural network to classify images that have never been seen by the training process.

One of the most important things to get a good result is to have a good training dataset and a good dataset for validation, the organization of the input dataset is crucial to have a good model capable of making predictions.

Deep learning is not just for image recognition, it can be applied to almost all fields, including natural language processing, bioinformatics, recommendation systems, etc.
I’m still just scratching the surface of deep learning, there are so many things to learn, I’ll continue to study it and to write about it.

deep learning, neural network, classifier, colab, fastai