Bangla Handwritten Characters Recognition Using Convolutional Neural Network

.


INTRODUCTION:
The main objective of this research is to recognize individual Bangla handwritten characters.Which can be extended further by providing appropriate methods and can be used in different scenarios like transforming human written documents to digital Unicode documents, extracting information from national identity cards, driving licenses, bank cheques, and many more.Convolutional Neural Network (CNN) (Albawi et al., 2017), is one of the best methods to perform classification on image data.CNN mimics the visual cortex of the human brain.The visual layers of the human brain can detect complex features from the image to recognize an image.The same principle is applied in CNN.In the CNN layer, a different type of filter is applied over the image to extract feature images from which prediction is performed using a neural network.

Previous work
Previous research in the field of Bangla character categorization has primarily focused on the Bangla digit, which has ten digits.There are a few works available for handwritten character recognition in Bangla.Other people worked with Bangla's hand-written character recognition but all of them worked with 50 letters and 10 numerals separately.

METHODOLOGY: Architecture
The proposed CNN model has 10 layers and all of them are connected sequentially.The first layer is an input layer that defines Input image size and the number of color channels.In our model, its static value is (32x32x1).Then two convolutional layers are connected back to back.One of them has 32 filters and another one has 64 filters.Each of them hasthe same kernel size 3x3 activation function RELU.After that comes a special layer is known as Max-Pooling, which shrinks the size of the image to half by picking up the maxi-mum value.Then comes again the two convolutional layers which mimic the previous two layers' defin-itions.After that comes a Flatten layer, which trans-forms 2-dimensional data to 1-dimensional data.The next two dense layers and a dropout layer are conn-ected in a sandwich manner, having a dropout layer in the middle.The first dense layer act as a hidden layer in which 1-dimensional data is mapped.Then dropout layer randomly deletes someweights based on their threshold value.The Last dense layer acts as model output.It has the same number of nodes as the number of classes that need to be classified.It also has an acti-vation function as softmax.So, our model has 637,724 parameters and all of them are trainable.Fig. 1 shows the architecture of the hand-written character recogn-ition model.Table 1 shows the internal parameters we are using for the model.

Graphical User Interface
The graphical interface of this trained model is developed using web technologies.For server-side workload, Python Flask (Aslam & Mohammed, 2015).Module is used, which loads the pre-trained model and provides API to the frontend.For the frontend, Javascript React framework is used, it's communicating with the backend Flask server through the API and provides a Graphical User Interface to the user.Fig. 2 shows the output using the graphical user interface.

Datasets Dataset preprocessing
In this work, we have created a combined dataset with 50 letters and 10 numerals totaling 60 alphabets and trained our model on it.Multiple datasetsare used to train the model.wehave used TensorFlow 'image_ dataset_from_directory' API to create training and validation datasets.This API is capableof gene-rating an image label from its directory name.To do so we have to rename the directory name "0" to "59",where "0" to "9" represent "০ " to "৯" and "10" to "59" represents"অ" to "◌ ".All of the images in the dataset are resized to 28x28 images using bilinear interpolation.Images are read in single-channel and normalized from 0-255 to 0-1 and batched together.

Accuracy and Performance
We have trained our model using both datasets and can achieve 90.22% accuracy for the Bangla Lekha-isolated dataset and 93.40% for the Ekush dataset.From the performance graph, Fig. 4, we can see that our model gets its highest validation accuracy in 8 epochs after that our model is overfitted.Redline in our graph indicates the best fit for our model.
From the performance graph, Fig. 5, we can see that our model gets its highest validation accuracy in 9 epochs after that our model is overfitted.Redline in our graph indicates the best fit for our model.

CONCLUSION:
This research finds out the accuracy of output from our proposed model and is comprised of another model.We can find the accuracy using the noisy dataset also find accuracy using the noiseless dataset.The performance graph shows that our model gets its highest validation accuracy 90.22% for the Bangla Lekhaisolated dataset and 93.40% for the Ekush dataset.

Fig. 3 :
Fig. 3: Visual representation of a chunk of the dataset.

Table 1 :
Internal parameters for our Model.
• Bangla Lekha-isolated dataset (Mithun et al., 2017): It's a dataset of 84 classes that contain Bengali numbers, vowels, consonants, and compound characters.each class contains 2000 (approx) images.But we are only interested in the first 60 classes, the rest of them are deleted.So the number of total images is 120000 (approx), where 80% (96000) is used for the training set and 20% (24000) is reserved for the validation set.
• Ekushdataset (Rabby et al., 2019): This dataset contains 60 classes of the image in two categories male and female.We merged them into a single folder.Now each class contains 3000(approx) images.So total number of images 180000 (approx), where 80% (144-000) used in the training set and 20% (36000) reserved for the validation set.

Table 2 :
Performance comparison of our model.