Convolutional neural network

CONVOLUTIONAL NEURAL NETWORKS



Convolutional neural network is used in image identification, speech recognition, pattern recognition etc.

To understand the concept of convolutional neural network, we will divide the entire topic into 5 sections:


  1. Convolutional Layer: In this layer we pass a convolution filter over an image, in this we add each individual element to its neighbor weighted by the kernel. This process is done sharpen the image and to highlight the characteristic features of the image.Furthermore, a computer for a single convolution layer a large number filters are convoluted over the image (say for this project we will be using the maximum number of layers to be 64).
What's the Difference Between a CNN and an RNN? | The ...
Fig: This figure represents a basic understanding for convolution layer

2. Pooling Layer: In this we pass the pooling matrix over an image data so as to reduce the computation time, by reducing the overall size of the image and only taking the most important parts if the image. This process increases the accuracy of the machine to learn from the image. In this project we will be using max pooling for the pooling layer.

Max Pooling: As the name suggests this pooling layer takes out the maximum value that is inside the pooling matrix while passing over the image.
Understanding Convolutional Neural Networks for NLP รข€“ WildML
Fig: This image shows the perfect example for max pooling layer.


3) Flattening layer: After passing the image through all the pooling and convolution layer we have to flatten the image , so that we can pass these values through the neural network, for training and prediction.

Fig: Figure represents the way flattening layer produce the input layer for the neural net



4) Fully connected layer: In this layer we will pass our flattened image to fully connected neural network through the input nodes of the neural net.

Fig: The output from flattening layer is taken as input for the neural net



5) Dropout layer: This layer takes out or deactivates those nodes of the neural net which are not being used by the neural net, thereby reducing the computation complexity and and preventing over fitting the model. 

STREET SIGN CLASSIFICATION

DATA PREPARATION


Before we begin identifying the street signs we will prepare the data such that the images are properly aligned with its labels


Fig: The figure shows 3 out of a total of 42 classe of images

ii) Split the data into training and validation sets with keeping a check on the allignement of image with its respective labels.

iii) Do a quick display of image with its respective labeles for quick check

iv) Check the frequency of each labels with its respective set of image.
Fig: The y-axis shows the number of images and x-axis shows the image class number which is directly related to labels

IMAGE AUGMENTATION

i) Convert the image to gray scale. This helps brings the down image channel down from 3 channel on to 2 channels.

ii) Next step equalize the histogram, this is carried out because a bright picture have all pixels confined to in the higher range values., but a good image will have pixels from all regions of the image, so for this, we need to stretch the histogram to either side.



iii) Make a function image preprocess to pass a given image through above metioned augmentations and finally equalize the output image.



IMAGE CLASSIFICATION MODEL


i) For image classification, we will use a convolutional neural network

ii) Make a function that will return the AI model for the above and follow the below-mentioned steps:

a. First make the first convolution layer with 60 filters and with convolution matrix of configuration (5,5), accepting an input shape of (32,32,1), and with the activation function of rectified linear unit. Rectified Linear Unit (relu):

https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.t2qFY1xGn2xpo1Ezw7ivWQAAAA%26pid%3DApi&f=1
Fig; Rectified linear unit (relu)




b. Make the second convolutional layer with the only difference from before as in this we will not taking the input shape as before as the first convolutional layer takes care of it already

c. Make a max-pooling layer with a pool size of (2,2)

d. make the 3rd and 4th convolution layer with 30 filters, with the kernel size of (3,3) and with activation function rectified linear units.

e. Finally, add a max-pooling layer with the pool size of (2,2) and pass the image data via the flattening layer to pass it further through the hidden layers.

f. Add the first hidden layer with 500 output nodes and following the activation function rectified linear units

g. Add a 50% dropout layer to deactivate 50% of the inactive nodes.

h. Add the final output layer with the output size equal to the number of individual labels and having an activation function of softmax, to calculate the probability of the image being belonging to individual labels.

Softmax:

Sigmoid function - Wikipedia
Fig: Sigmoid function




iii) Finally, compile the entire model together with the optimization function of Adam having a learning rate of 0.001 and loss function of categorical cross-entropy


TRAINING THE DATASET

For training, the model fit the model through a generator with steps per epoch to 2000 and the no of epochs to 10 with shuffling the data after each epoch. Store an above function is an object.

OUTPUT ANALYSIS

i) Loss function:
Fig: The plot represents decrease of loss function with respect to validation loss (Generalized model Successful)





ii) Accuracy:

Fig: The plot shows the accuracy of training dataset with respect to validation dataset.




Fig represents the test prediction of a random te image passed through the model. Here 34 is class number of ‘Turn left ahead’



Comments