Understanding Computer Vision: Part 4

This tutorial is the foundation of computer vision delivered as “Lesson 3” of the series, there are more Lessons upcoming which would talk to the extend of building your own deep learning based computer vision projects. You can find the complete syllabus and table of content here

Target Audience : Final year College Students, New to Data Science Career, IT employees who wants to switch to data science Career .

Takeaway : Main takeaway from this article :

  1. Morphological operations
  2. Exercise to extract the tabular structure in an invoice using Morphological operations

Morphological operations

We can use morphological operations to increase the size of objects in images as well as decrease them.

We can also utilize morphological operations to close gaps between objects as well as open them.

Some of the morphological operations :

  • Erosion
  • Dilation
  • Opening
  • Closing
  • Morphological gradient
  • Black hat
  • Top hat (or “White hat”)

Sometime, we don’t have to use fancy algorithms to solve computer vision problems. Say few months back i was working on extracting the contents of an invoice The contents inside the structuring element of the invoice

You are more likely to find solution using morphological operations.

Erosion:

Erosion works by defining a structuring element and then sliding this structuring element from left-to-right and top-to-bottom across the input image.

A foreground pixel in the input image will be kept only if ALL pixels inside the structuring element are > 0. Otherwise, the pixels are set to 0 (i.e. background).

Erosion is useful for removing small blobs in an image or disconnecting two connected objects.

Fig 4.1 EROSION APPLIED TO OUR ORIGINAL IMAGE. AS THE ITERATION INCREASES MORE THE EROSION IS !

#Erosion of text with 2 different iteration

import cv2
import argparse

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

eroded2 = cv2.erode(gray.copy(), None, iterations=2)
cv2.imshow(“Eroded Image with 2 Iteration”, eroded2)

eroded4 = cv2.erode(gray.copy(), None, iterations=4)
cv2.imshow(“Eroded Image with 4 Iteration”, eroded4)

cv2.waitKey(0)

As morphological operations can only be done on grey scale image(black and white) ,however there are exceptions to it .We convert the colored image to a grey scale image by calling cv2.cvtColor function.

We perform the actual erosion on the next line by making a call to the cv2.erode function. This function takes two required arguments and a third optional one. The first argument is the image that we want to erode — in this case, it’s our binary image. The second argument to cv2.erode is the structuring element. If this value is None , then a structuring element, identical to the 8-neighborhood structuring element we saw above will be used. Of course, you could supply your own custom structuring element here instead of None as well.

The last argument is the number of iterations the erosion is going to be performed. As the number of iterations increases, we’ll see more and more of the original image characters eaten away.

Dilation:

Dilations increase the size of foreground object and are especially useful for joining broken parts of an image together.

Dilations, just as an erosion, also utilize structuring elements — a center pixel p of the structuring element is set to white if ANY pixel in the structuring element is > 0.

Fig 4.2 DILATION APPLIED TO OUR ORIGINAL IMAGE. AS THE ITERATION INCREASES MORE THE DILATION IS !

#Dilation of text with 2 different iteration

import cv2
import argparse

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

dilated2 = cv2.dilate(gray.copy(), None, iterations=2)
cv2.imshow(“Dilated Image with 2 Iteration”, dilated2)

dilated4 = cv2.dilate(gray.copy(), None, iterations=4)
cv2.imshow(“Dilated Image with 4 Iteration”, dilated4)

cv2.waitKey(0)

The actual dilation is performed by making a call to the cv2.dilate function, where the actual function signature is identical to that of cv2.erode .

Opening:

Performing an opening operation allows us to remove small blobs from an image: first an erosion is applied to remove the small blobs, then a dilation is applied to regrow the size of the original object.

Fig 4.3 APPLYING A MORPHOLOGICAL OPENING OPERATION TO OUR INPUT IMAGE.

#Opening with kernel size of 10,10

import cv2
import argparse

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

kernelSize =(10,10)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
opening = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
cv2.imshow(“Opening Image for kernel size 10,10:”, opening)

cv2.waitKey(0)

Here we make a call to cv2.getStructuringElement to build our structuring element. The cv2.getStructuringElement function requires two arguments: the first is the type of structuring element we want, and the second is the size of the structuring element (we have defined our kernel size to be (10,10)).

We pass in a value of cv2.MORPH_RECT to indicate that we want a rectangular structuring element. But you could also pass in a value of cv2.MORPH_CROSS to get a cross shape structuring element (a cross is like a 4-neighborhood structuring element, but can be of any size), or cv2.MORPH_ELLIPSE to get a circular structuring element. Exactly which structuring element you use is dependent upon your application — and I’ll leave it as an exercise to the reader to play with each of these structuring elements.

The actual opening operation is performed on next line by making a call to the cv2.morphologyEx function. This function is abstract in a sense — it allows us to pass in whichever morphological operation we want, followed by our kernel/structuring element. Please try passing other operations and see the difference for yourself.

The first required argument of cv2.morphologyEx is the image we want to apply the morphological operation to. The second argument is the actual type of morphological operation — in this case, it’s an opening operation. The last required argument is the kernel/structuring element that we are using.

Opening allows us to remove small blobs in an image. However, we can also perform other tasks like extracting the Horizontal and vertical lines in an image. We will see more about this in the below final exercise, where we will extract the tabular structure from an invoice using morphological operations.

Closing:

As the name suggests, a closing is used to close holes inside of objects or for connecting components together.

Fig 4.4 APPLYING A MORPHOLOGICAL CLOSING OPERATION TO OUR INPUT IMAGE.

#Closing with kernel size of 13,13

import cv2
import argparse
import numpy as np

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

kernelSize =(13,13)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
closing = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imshow(“closing performed with kernel size of 13,13 “, closing)

cv2.waitKey(0)

Morphological gradient

Fig 4.5 A MORPHOLOGICAL GRADIENT CAN BE USED TO FIND THE OUTLINE OF AN OBJECT IN AN IMAGE.

#Morphological gradient applied to extract the boundary of an Image

import cv2
import argparse
import numpy as np

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

kernelSize =(7,7)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
gradient = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
cv2.imshow(“Gradient “, gradient)

cv2.waitKey(0)

Top Hat & Black Hat:

A top hat operation is used to reveal bright regions of an image on dark backgrounds. As seen in the Fig the results of Top Hat aren’t very existing. Notice how only regions that are light against a dark background are clearly displayed — in this case, we can clearly see that the license plate region of the car has been revealed.

But also note that the license plate characters themselves have not been included. This is because the license plate characters are dark against a light background. And to reveal our license plate characters we’ll need the black hat operator.

The black hat operator is simply the opposite of the white hat operator! The results of black hat is a lot more promising as the license plate text itself being darker than the license plate background.

Inference: APPLYING A TOP HAT OPERATION REVEALS LIGHT REGIONS ON A DARK BACKGROUND.

APPLYING THE BLACK HAT OPERATOR REVEALS THE DARK LICENSE PLATE TEXT AGAINST THE LIGHT LICENSE PLATE BACKGROUND.

Fig 4.6 APPLYING A TOP HAT OPERATION REVEALS LIGHT REGIONS ON A DARK BACKGROUND. APPLYING THE BLACK HAT OPERATOR REVEALS THE DARK LICENSE PLATE TEXT AGAINST THE LIGHT LICENSE PLATE BACKGROUND.

#Top Hat and Black Hat performed on Car number plate

import cv2
import argparse
import numpy as np

apr = argparse.ArgumentParser()
apr.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(apr.parse_args())

image = cv2.imread(args[“image”])

print(f’(Height,Width,Depth) of the image is: {image.shape}’)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

# tophat (also called a “whitehat”) operation will enable s to find light regions on a dark background
tophat = cv2.morphologyEx(gray, cv2.MORPH_TOPHAT, rectKernel)

rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

#
show the output images
cv2.imshow(“Original”, image)
cv2.imshow(“Blackhat”, blackhat)
cv2.imshow(“Tophat”, tophat)

cv2.waitKey(0)

Exercise :

The invoice document looks like this:

Fig 4.7 INPUT INVOICE DOCUMENT AS AN IMAGE

# Solution

# import the necessary packages
import argparse
import cv2
import numpy as np

#
construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument(“-i”, “ — image”, required=True, help=”Path to the image”)
args = vars(ap.parse_args())

#
load the image and convert it to grayscale
image = cv2.imread(args[“image”])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow(“Original”, image)

kernel = np.ones((3,3),np.uint8)
dilation = cv2.dilate(~gray.copy(),kernel,iterations = 1)
h_kernel = np.ones((100,1),np.uint8)
h_lines = cv2.morphologyEx(dilation, cv2.MORPH_OPEN, h_kernel)
v_kernel = np.ones((1,100),np.uint8)
v_lines = cv2.morphologyEx(dilation, cv2.MORPH_OPEN, v_kernel)
line_img = h_lines + v_lines
cv2.imshow(“component’, line_img)
cv2.waitKey(0)

Result:

Fig 4.8 DESIRED RESULT OF INVOICE WITH TABULAR STRUCTURE EXTRACTED

Here in the above code, the first 11 lines of code are familiar to us. We are loading and converting the image to a gray scale image.

The next two lines are codes are used to dilate the objects in the image. This is done to ensure that there are no broken parts in the objects across the image.

Next 4 lines of codes are used to extract all the horizontal lines in the image and all the vertical lines in the image using opening operation with horizontal and vertical kernels respectively. After extracting the horizontal and vertical lines in the image, we connect the lines at the point of intersections and extract the connected components using line_img = h_lines + v_lines resulting in the tabular structure.

However, the line kernels — kernel = np.ones((100,1),np.uint8) that we pass on to the horizontal line detector and vertical line detector is what that makes difference in detecting the lines.

kernel = np.ones((100,1),np.uint8) — Horizontal line kernel

kernel = np.ones((1,100),np.uint8) — Vertical line kernel

To read the other Lessons from this course, Jump to this article to find the complete syllabus and table of content

— — — — — — — — — — -> Click Here

Bay of Tech : ”Affordable technology solutions to everyone” |BoT provides solutions in Industry 4.0 | This space is the perspective page of BoT