Understanding Computer Vision : Part 5

This tutorial is the foundation of computer vision delivered as “Lesson 5” of the series, there are more Lessons upcoming which would talk to the extend of building your own deep learning based computer vision projects. You can find the complete syllabus and table of content here

Target Audience : Final year College Students, New to Data Science Career, IT employees who wants to switch to data science Career .

Takeaway : Main takeaway from this article :

  1. Image Classification Using Machine Learning
  2. Image Classification : Machine Learning way vs Deep Learning way

Image Classification

By and large,Image classification is nothing but assigning a label to an image from a set of pre-defined categories. The FIG 5.1 depicts the difference between an image classification to other process that we can do on an image using computer vision.


This means, we pass an image to the algorithm and the algorithm returns a label in the form of a string from a pre-defined set of categories as shown in the first quadrant ((a) Image Classification) of the FIG 5.1. The other quadrants in the above FIG 5.1 are some of the other things that we can do in computer vision by using machine learning and deep learning. We will look into them as we move forward in the course.

Let us assume a set of pre-defined categories : Categories ={cat,fish,elephant}

Then we input the below image FIG 5.2 to the Image Classification system:


The Image Classification system outputs a label from the set of categories = {cat,fish, elephant} — in this case,fish.

Our Image Classification system could also assign multiple labels to the image via probabilities, such as cat: 0%, fish: 99% and elephant: 0%

The above approach is known as Supervised Learning, where our input data consists of the image data and the labels associated with each image, allowing us to train/teach our classifier what each category looks like. Here, the pre-defined set of categories we saw earlier are the labels.

The training datasets of the above Image Classification system would looks like in FIG 5.3:


Image Classification System:

lets see some of the steps involved in assigning a label to an image from a set of pre-defined labels. ie, Building an Image Classifier

  • Step 1: Creating your dataset
  • Step 2: Splitting the dataset into training and testing dataset
  • Step 3: Feature Extraction
  • Step 4: Training your classification model
  • Step 5: Evaluating your classifier

Creating your dataset

First step in creating a Image Classification pipeline is to create a dataset relevant to the problem, we are trying to solve. The dataset will contain the image itself and the label associated with each image.

In the above example as shown in the FIG 5.3, the dataset should be uniformly distributed. If we have twice the number of cat images than fish images, and five times the number of elephant images than cat images, then our classifier will become naturally biased to “overfitting” into these heavily-represented categories.

There is no thumb rule available to define the volume of dataset. In case of dataset with less volume in deep learning, we employ a technique called Transfer Learning. We will see more about Transfer Learning going forward in this course.

Splitting the dataset into training and testing dataset

We split the dataset into a Training and Testing set. Training set is used to by our classifier to learn what each category looks like by making predictions on the input data and then corrected when the predictions are wrong.Testing set is used to evaluate the performance of the classifier by validating the predicted labels vs the actual labels from testing set to draw a confusion matrix and derive the accuracy.

The testing set has to be entirely independent from the training set, as we are only going to used for validation to check the performance of our classifier.

The split is size of testing and training set are up-to the developer to decide,some of the common split sizes are:

Training : Testing :: 66.7% : 33.3% | Training : Testing :: 75%: 25% | Training : Testing :: 90%: 10%


Feature Extraction

We need to extract features to abstractly quantify and represent each image.

Images are represented as matrix of pixels as we learnt in the first few lessons in this course, sometimes we may even use the raw pixel intensities of the images themselves as feature vectors.

But in most cases in a Machine Learning approach, we tend to use the following feature extractors to quantify an image as feature vectors.

  • Color Histograms
  • Histogram of Oriented Gradients
  • Local Binary Patterns
  • Hu Moments
  • Keypoint Detectors : BRISK, FAST, STAR etc…
  • Local Invariant Descriptors : SIFT, SURF etc…
  • Binary Descriptors : BRIEF, FREAK etc…

However, we don’t take this trouble of converting an image to feature vector in a Deep Learning approach. Because in deep learning approach using CNN (Convolution Neural Network algorithm) end-to-end model the network takes the trouble of exacting its feature vectors in its hidden layers. We will see about them in details going forward in this course.

So, You don’t have to bother much about the Machine Learning way of doing Image Classification, but its to good to know them exist. Because this course is intended to focus on Computer Vision using Deep Learning.

Training your classification model

Since this lesson on Image Classification is a Machine Learning specific one, we can use the following machine learning algorithms to distinguish between categories.

  • Support Vector Machines
  • Logistic Regression
  • Decision Tree
  • Random Forests
  • K-Nearest Neighbor

One of the above machine learning algorithm takes the extracted feature vectors as input and outputs label associated to that image.

Don’t worry, if the Machine Learning algorithms are new to you. We will dive deep into the machine learning algorithms in the next lesson.

Evaluating your classifier

Lastly, we evaluate the labels that the machine learning algorithm outputs. We compare the predicted labels vs the ground-truth labels from our testing set.

The ground-truth labels represent what the category actually is. From there, we can compute the number of predictions our classifier got right and compute aggregate reports such as precision, recall, and f-measure, which are used to quantify the performance of our classifier as a whole. As shown below:


Image Classification : Machine Learning way vs Deep Learning way


However, In an end-to-end Deep Learning we approach the Image Classification in an entirely different way. The steps involved in a deep learning approach is given below

  • Step 1: Creating your dataset
  • Step 2: Splitting the dataset into training and testing dataset
  • Step 3: Training your Network
  • Step 4: Evaluating your classifier

Yes, we are skipping the Feature Extraction step. We don’t need to convert the images to a feature vector.


The reason for this is because CNNs are end-to-end models. We present the raw input data (pixels) to the network. The network then learns filters inside its hidden layers that can be used to discriminate amongst object classes. The output of the network is then a probability distribution over class labels.

One of the exciting aspects of using CNNs is that we no longer need to fuss over hand-engineered features — we can let our network learn the features instead. However, this trade off does come at a cost. Training CNNs can be a non-trivial process, so be prepared to spend considerable time familiarizing yourself with the experience and running many experiments to determine what does and does not work.

This course “Computer Vision using Deep Learning” is done with a deep learning mindset. Going forward, we will get into details of Neural Network and Convolution Neural Networks.

To read the other Lessons from this course, Jump to this article to find the complete syllabus and table of contents

— — — — — — — — — — -> Click Here

Bay of Tech : ”Affordable technology solutions to everyone” |BoT provides solutions in Industry 4.0 | This space is the perspective page of BoT