MNIST Handwritten digits classification using Keras (part – 1)

Deploy Keras model to production, Part 1 - MNIST Handwritten digits classification using Keras

Hello everyone, this is going to be part one of the two-part tutorial series on how to deploy Keras model to production. In this part, we are going to discuss how to classify MNIST Handwritten digits using Keras. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlowCNTK, or Theano. However, for our purpose, we will be using tensorflow backend on python 3.6. It was developed with a focus on enabling fast experimentation. I have divided the tutorial series into two parts:

  1. MNIST Handwritten digits classification using Keras
  2. Deploying Keras model to production using flask

By the end of the tutorial series, you will be able to build and deploy your very own handwritten digit classifier that looks something like this:
MNIST digit classification in browser

Without further ado, let’s get started. First, let’s create a virtual environment and install all the necessary dependencies.

Create Virtualenv and install necessary dependencies

Why python virtual environment is needed has already been discussed in this another post here so I’m not going to do that here. Let’s get started with how we can set up virtualenv and install necessary dependencies in python 3.6. The easiest way is installing through python pip package. To install virtualenv through pip, simply type:

pip3 install --upgrade virtualenv

Once the virtualenv is installed, you can create separate virtual environments for each of your projects. Simply go to the project directory and type:

virtualenv kerasenv

You will see a message in your terminal like:

Installing setuptools, pip, wheel…done.

In a newly created virtualenv there will be an activate shell script. This resides in /bin/, so you can run:

source kerasenv/bin/activate

Now, we are ready to install necessary dependencies. The list of dependencies we will be needing for our project are as follows:

  1. tensorflow (1.5.0)
  2. Keras (2.1.4)
  3. Flask (0.12.2)
  4. h5py ( 2.7.1)

You can install these all at the same time using the command:

pip3 install tensorflow keras Flask h5py

Computation is much faster if you have a GPU but you’ll need to use GPU version of tensorflow. If you plan on using tensorflow-gpu instead, you can follow our other article here to learn how to install it.

Our other required dependencies such as scipy, numpy etc. should automatically be installed while installing these dependencies.

Introduction to Convolutional Neural Network (CNN)

Now, we are ready to build a Convolutional Neural Network (CNN) to classify MNIST handwritten digits. But first, we must understand what a CNN is. We will only be covering the basic theory of CNN in this article. I highly recommend you refer to materials of course CS231n, if you want a deeper understanding of how CNN works.

In machine learning, a Convolutional Neural Network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. Convolutional Neural Networks are a type of neural network that makes the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. There are three main types of layers to build ConvNet architectures: Convolutional LayerPooling Layer, and Fully-Connected Layer. We will stack these layers to form a full ConvNet architecture.

CNN architecture

Image source: Wikipedia

CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume, POOL layer will perform a downsampling operation along the spatial dimensions (width, height) and FC (i.e. fully-connected) layer will compute the class scores, ( resulting in volume of size [1x1x10] in our case), where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. All this may seem very confusing to you right now. So I highly recommend you refer to materials of course CS231n if you want a deeper understanding of how CNN works. However, for now, all we need to understand is that CNNs are one of the best available tools for machine vision and we will be using it for our purpose for classification of MNIST handwritten digits.

MNIST Handwritten digits classification using Keras

Now that we have all our dependencies installed and also have a basic understanding of CNNs, we are ready to perform our classification of MNIST handwritten digits. The Keras github project provides an example file for MNIST handwritten digits classification using CNN. (link: https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py) The classifier gets to 99.25% test accuracy after 12 epochs. We will not go with the approach of re-inventing the wheel and simply modify the example file to meet our necessity. The code is well commented and pretty much self-explanatory, however, we will be going through each portion of the code to understand what is being done.

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

At first, we import the necessary dependencies.

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Here, we define the batch size of 128 data per epoch. Batch size defines the number of samples that going to be propagated through the network at each epoch. It requires less memory and is especially important in case if you are not able to fit dataset in memory. Since MNIST handwritten digits have a input dimension of 28*28, we define image rows and columns as 28, 28. The function mnist.load_data() downloads the dataset, separates it into training and testing set and returns it in the format of (training_x, training_y),(testing_x, testing_y). Some of you might get an error saying “IOError: CRC check failed 0xc187cf56L != 0x14c5212fL” in this function when running the file. For those of you who get the error, open the file kerasenv/lib/python3.6/site-packages/keras/utils/data_utils.py and just below the import statements, add:

import requests
requests.packages.urllib3.disable_warnings()
import ssl

try:
   _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
   # Legacy Python that doesn't verify HTTPS certificates by default
   pass
else:
   # Handle target environment that doesn't support HTTPS verification
   ssl._create_default_https_context = _create_unverified_https_context

If you don’t get the error, there’s no need to do this.

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

In this portion of the code, we rearrange the shape of our input data to pass it as input to our convolutional neural network.

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Neural networks perform much better when the output label is fed as a sparse matrix so we convert the y-label for both train and test data as a sparse matrix.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

The actual design of neural network is done in this portion of the code. First, we define the model to be a sequential model. We stack Convolutional Layer and Pooling Layer on top of each other along with Dropout layer. Dropout layers provide a simple way to avoid overfitting by randomly dropping components of neural network (outputs) from a layer of neural network. This results in a scenario where at each layer more neurons are forced to learn the multiple characteristics of the neural network. The last layer of the neural network will have number of node equal to the number of output class i.e. 10 and the activation function we will be using is “softmax”.

Now there are a lot of things like activation functions, (eg: relu, sigmoid; loss, categorical crossentropy; optimizer, adadelta) which I will not be explaining in this post. We don’t need it for our purpose right now but I do suggest you learn about those to get a better understanding of how to design neural networks.

The model.fit function trains the model for a fixed number of epochs (iterations on a dataset).

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

This part of the code prints the loss and accuracy of the final model after the training is complete. In my case, the test loss was around 0.03 and the test accuracy, 0.9898.

We need to add a few lines of code at the end to save the structure of the model itself as well as the weights that we’ll later load using flask. To do so, add the following piece of code at the end.

model_json = model.to_json()

with open("model_json", "w") as json_file:
  json_file.write(model.json)

model.save_weights("model.h5")

The complete code for the file is posted below:

'''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.
'''

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

model_json = model.to_json()

with open("model_json", "w") as json_file:
  json_file.write(model.json)

model.save_weights("model.h5")

Save the file as “mnist_cnn.py” and run it using the command python3 mnist_cnn.py The training might take some time depending on your machine and when it’s complete, you should see an output like:

Epoch 11/12

60000/60000 [==============================] – 128s 2ms/step – loss: 0.0269 – acc: 0.9917 – val_loss: 0.0264 – val_acc: 0.9915

Epoch 12/12

60000/60000 [==============================] – 121s 2ms/step – loss: 0.0274 – acc: 0.9915 – val_loss: 0.0311 – val_acc: 0.9898

Test loss: 0.03113338363569528

Test accuracy: 0.9898

You will also find the files model.h5″ and model.json”  in the working directory. We will be loading these file later using flask to make predictions. That’s all for this part of the tutorial. In the next part, we will learn how to deploy Keras model to production using flask and make predictions using the model files we created in this part of the tutorial. Like us on facebook to receive more updates.

 

About Aryal Bibek 15 Articles
Learner and practitioner of Machine Learning and Deep Learning. Ph.D. student at The University of Texas at El Paso. Admin and Founder at python36.com

3 Comments on MNIST Handwritten digits classification using Keras (part – 1)

  1. Nice article! One small error I noticed though, was in the model-saving section. The arguments to “open()” and “write()” are swapped, it should be:

    with open(‘model.json’, ‘w’) as json_file:
    json_file.write(model_json)

3 Trackbacks & Pingbacks

  1. New top story on Hacker News: MNIST Handwritten digits classification using Keras (part – 1) – Tech + Hckr News
  2. MNIST Handwritten digits classification using Keras (part - 1) | Python 3.6 | Softwares Bloggers | Softwares Reviews | Best Online Softwares
  3. New top story on Hacker News: MNIST Handwritten digits classification using Keras (part – 1) – ÇlusterAssets Inc.,

Leave a Reply

Your email address will not be published.




This site uses Akismet to reduce spam. Learn how your comment data is processed.