Computer Vision and Deep Learning- Part 1

Published in

Analytics Vidhya

5 min readFeb 8, 2021

Introduction

On this series about robotics, I have targeted Computer Vision with Deep Learning first. One may ask why to get started with this topic first? Answer to the question is, to build a good understanding of writing codes, and get to know how a machine perceives the world around it. A camera is a very common sensor used by robots to sense the environment and then perform actions. For this, we need to understand what where and hows of computer vision. After introducing the topic I will explain why we need to spice up this exceptional field with Deep Learning. So let’s get started!

Computer Vision

Image 1 — RGB Color separation of original image. Grey images shows the contribution of each colour to build the image(more white implies more contribution). Image credits: wikimedia.com

The camera we most frequently use in smartphones interprets environment in form of binary digits(0 and 1). Weird! Isn’t it? Yep, let me explain. For this, I will take the example of an RGB image. RGB means Red, Green and Blue. That is, an image is a meaningful combination of the three colours( Refer image 1) . Let us say I have a plain red colour image in my phone, this implies I have given score 1 to red, 0 to green and blue. You may ask there can be shades of red? How will you represent them? So, here comes the concept of dividing 1 into 256 equal parts. That is, now I will explain the darkness of red in 256 levels. One might ask where did this 256 number come from and why only 256 parts, not any other number? Okay, for this let’s go the concept of expressing one byte of computer memory. One byte is equal to 8 bits. Dedicating one byte to red colour implies dedicating 8 bits to the colour. So darkest of red will look something like 11111111(256 in binary number representation) and faintest red will look something like 00000001(1 in binary number). If 2(every bit can hold either 0 or 1) raised to the power 8(number of bits dedicated to colour red), we get 256. Thus, green and blue are given a dedicated byte to express the strength of each colour. If we want more information in an image, we can use 16 bits or so on of data for each colour. Grey images are similar. They have just one channel. Pixel values there represents the shades between black and white.

Every small box in the 8x10 image represents a pixel. When pixel of R, G and B is combined it gives value for one pixel of colorful image. Thus pixel A of colorful image will be given by [R, G, B]. Image credits: geeksforgeeks.org

Whatever we have talked till now was going inside a pixel. Pixel? What is that? As the above image expresses, pixel is a basic building block of an image. For example, you come across an 8x8 image. This implies there are 64 pixels in that image with every pixel represented by the combination of 3 colours. That is every pixel will be represented by 3 bytes of memory. 64x3=192bytes for 8x8 image! We all love to take photographs in High Definition, consider 1080p which is 1920 x 1080 pixels! Or 2.1 megapixels. With the calculation we performed for 8x8 image, you can image how memory-wise heavy a high definition image is.

Choosing the right resolution for fetching surrounding images from a robot is crucial. Because of the sustained trade-off between more information(high definition image) and the memory space usage of onboard processor. Since a camera is a power-demanding sensor, we need to take care of the battery being used to power the robot.

Image 3: Left side image in RGB format, On right side same image in HSV format. Image credits:zedge.net

Image 4: Hue, Saturation and value components of the above image.

The example I gave for the RGB image is rudimentary. Other image formats are HSV(hue, saturation and value), HSL(Hue, Saturation, lightness),CMY(Cyan, Magenta and Yellow),CMYK(CMY plus black), YIQ(Y=0.30R+0.59G+0.11B
I=0.60R-0.28G-0.32B
Q=0.21R-0.52G+0.31B). What image format has to use rely on the application. For example, you can’t rely on RGB image format when you operate your robot outdoors. Varying sunlight brightness will ruin the whole application. In such cases, we rely on the HSV image format.

In outdoor as well as indoor robot applications RGBD cameras are frequently used. D here stands for depth. That is the distance of the object seen by the camera, from the camera. This data is stored in form of point clouds. The data is 3D now because of the inclusion of the third dimension as depth. People also use stereo vision for calculating depth. Usually, RGBD cameras like Kinect are expensive, I order to build a budget-friendly robot, use of stereo(meaning more than one) vision is common. Calibration of a stereo camera to calculate depth is an important aspect.

OpenCV

Having known the basics of how an RGB image is prepared, we need to understand how a robot’s brain will understand this information. Capturing an image and to “know” what is in the image are two very different topics. What a camera is capable of is to simply capture the image in the present resolution. The camera has no idea of what the captured image is all about. To interpret the details of the image, the robot’s brain has to do something. Tadaa! This introduces the need for a module which will help the brain to understand, the module used in OpenCV. OpenCV essentially helps us to perform various basic operations on an image and often provides with the tools to transport image in ROS(we will discuss it in detail in further tutorials).

One needs to understand that image processing and computer vision are two different ways of playing with an image. When we perform image processing, we enhance the image by tweaking parameters like sharpness, smoothing, stretching. Here we perform basic transformations on the image to prepare it for further use. Whereas, when we perform computer vision, it is about extracting information about what is in the image. We use machine learning related tools to have a better understanding of an image’s subject.

In the next tutorial, I will help you understand more about Opencv with some codes in python. Once we get comfortable with what Opencv is all about, we will jump to Deep Learning part. Till then, Keep Learning!

Computer Vision and Deep Learning- Part 1

Introduction

Computer Vision

OpenCV

Written by Rupali Garewal