In the Bangla language, there are 50 complex-shaped characters and working with this huge amount of characters with an appropriate set of features is a tough problem to recognize handwritten characters. Moreover, ambiguity and precision errors are common in handwritten words. Furthermore, among a large number of complex-shaped letters, some are quite similar in shape, making handwritten Bangla characters difficult to recognize. In this work, we proposed a convolutional neural network-based approach for recognizing the handwritten Bangla alphabet. In character recognition, the convolutional neural network (CNN) outperforms most of the other models. However, to guarantee a satisfactory performance, CNNs usually need a great number of samples. Bangla handwriting recognition has been a hot topic for several years, but due to the similarity of many Bangla characters, its difficult to achieve good results. By training and testing on Bangla character datasets, the model gets a 90.22% validation accuracy for Bangalekha isolated dataset and 93.22% validation accuracy for the Ekush dataset.
The main objective of this research is to recognize individual Bangla handwritten characters. Which can be extended further by providing appropriate methods and can be used in different scenarios like transfor-ming human written documents to digital Unicode documents, extracting information from national iden-tity cards, driving licenses, bank cheques, and many more. Convolutional Neural Network (CNN) (Albawi et al., 2017), is one of the best methods to perform classification on image data.CNN mimics the visual cortex of the human brain.The visual layers of the human brain can detect complex features from the image to recognize an image.The same principle is applied in CNN. In the CNN layer, a different type of filter is applied over the image to extract feature images from which prediction is performed using a neural network.
Previous work
Previous research in the field of Bangla character categorization has primarily focused on the Bangla digit, which has ten digits. There are a few works available for handwritten character recognition in Bangla. Other people worked with Banglas hand-writ-ten character recognition but all of them worked with 50 letters and 10 numerals separately. Rahmen et al. (2015) proposed a model which achieved 85.96% accuracy for 50 letters. Purkayastha et al. (2017) also proposed a model which achieved higher accuracy 89.01% for 50 letters (Chowdhury et al., 2019) pro-posed a model which achieved 91.13% accuracy for 50 letters and 98.42% accuracy for 10 numerical.
Apart from there also present several Bangla Hand-written Character Recognition and had achieved pretty success. Halima Begum et al. (2017) worked with their dataset that was collected from 95 volunteers and their proposed model was achieved without feature extr-action and with feature extraction around 68.9% and 79:4% of recognition rate respectively (Das et al., 2009) accuracy for Bangla character 76.86% and Bangla numeral 99.45%. (Rahman et al., 2015; Rahman et al., 2022) achieved 85.36% test accuracy using their dataset. In (Das et al., 2010) handwritten Bangla character recognition with MLP and SVM has been proposed and they achi-eved around 79.73% and 80.9% of recognition rate, respectively.
Architecture
The proposed CNN model has 10 layers and all of them are connected sequentially. The first layer is an input layer that defines Input image size and the num-ber of color channels. In our model, its static value is (32x32x1). Then two convolutional layers are conn-ected back to back. One of them has 32 filters and ano-ther one has 64 filters. Each of them hasthe same kernel size 3x3 activation function RELU.After that comes a special layer is known as Max-Pooling, which shrinks the size of the image to half by picking up the maxi-mum value.Then comes again the two convo-lutional layers which mimic the previous two layers defin-itions. After that comes a Flatten layer, which trans-forms 2-dimensional data to 1-dimensional data. The next two dense layers and a dropout layer are conn-ected in a sandwich manner, having a dropout layer in the middle. The first dense layer act as a hidden layer in which 1-dimensional data is mapped. Then dropout layer randomly deletes someweights based on their threshold value. The Last dense layer acts as model output. It has the same number of nodes as the number of classes that need to be classified. It also has an acti-vation function as softmax. So, our model has 637,724 parameters and all of them are trainable. Fig. 1 shows the architecture of the hand-written character recogn-ition model. Table 1 shows the internal parameters we are using for the model.
Table 1: Internal parameters for our Model.
Fig. 1: Architecture of the Handwritten Character Recognition Model.
Compiling
The Model is then compiled with an ADAM optimizer having a learning rate of 0.0001 and loss function as sparse_categorical_corssentropy. Model is trained in Google Colab GPU notebook.
Graphical User Interface
The graphical interface of this trained model is deve-loped using web technologies. For server-side work-load, Python Flask (Aslam & Mohammed, 2015). Module is used, which loads the pre-trained model and provides API to the frontend. For the frontend, Javas-cript React framework is used, its communicating with the backend Flask server through the API and provides a Graphical User Interface to the user. Fig. 2 shows the output using the graphical user interface.
Fig. 2: Graphical User Interface built with ReactJS.
Datasets
Dataset preprocessing
In this work, we have created a combined dataset with 50 letters and 10 numerals totaling 60 alphabets and trained our model on it. Multiple datasetsare used to train the model.we have used TensorFlow ‘image_ dataset_from_directory API to create training and validation datasets. This API is capableof gene-rating an image label from its directory name.To do so we have to rename the directory name ”0” to ”59”,where ”0” to ”9” represent ”০" to "৯" and "10" to "59" re-presents"অ" to "◌ ঁ". All of the images in the dataset are resized to 28x28 images using bilinear inter-polation. Images are read in single-channel and norma-lized from 0-255 to 0-1 and batched together.
• Bangla Lekha-isolated dataset (Mithun et al., 2017): Its a dataset of 84 classes that contain Bengali num-bers, vowels, consonants, and compound characters. each class contains 2000 (approx) images. But we are only interested in the first 60 classes, the rest of them are deleted. So the number of total images is 120000 (approx), where 80% (96000) is used for the training set and 20% (24000) is reserved for the validation set.
• Ekushdataset (Rabby et al., 2019): This dataset contains 60 classes of the image in two categories male and female. We merged them into a single folder.Now each class contains 3000(approx) images. So total number of images 180000 (approx), where 80% (144-000) used in the training set and 20% (36000) reserved for the validation set.
Fig. 3: Visual representation of a chunk of the dataset.
Accuracy and Performance
We have trained our model using both datasets and can achieve 90.22% accuracy for the Bangla Lekha-isol-ated dataset and 93.40% for the Ekush dataset.
Table 2: Performance comparison of our model.
Dataset Classes Accuracy
Bangla Lekha-isolated 60 90.22%
Ekush 60 93.40%
Fig. 4 (A): Training and Validation Accuracy for the Bangla Lekha-isolated Dataset.
Fig. 4 (B): Training and Validation Loss for the Bangla Lekha-isolated Dataset.
From the performance graph, Fig. 4, we can see that our model gets its highest validation accuracy in 8 epochs after that our model is overfitted. Redline in our graph indicates the best fit for our model.
From the performance graph, Fig. 5, we can see that our model gets its highest validation accuracy in 9 epochs after that our model is overfitted. Redline in our graph indicates the best fit for our model.
Fig. 5 (A): Training and Validation Accuracy for the Ekush Dataset.
Fig. 5 (B): Training and Validation Lossfor the Ekush Dataset.
This research finds out the accuracy of output from our proposed model and is comprised of another model. We can find the accuracy using the noisy dataset also find accuracy using the noiseless dataset. The per-formance graph shows that our model gets its highest validation accuracy 90.22% for the Bangla Lekha-isolated dataset and 93.40% for the Ekush dataset.
We are grateful to Pabna University of Science and Technology (PUST) for the support to the research.
The authors state that there is no potential conflict of interest in publishing this research article.
Academic Editor
Dr. Toansakul Tony Santiboon, Professor, Curtin University of Technology, Bentley, Australia.
Associate Professor, Department of Information and Communication Engineering, Pabna-6600, Bangladesh.
Hossain MA, Hasan MAFMR, Abadin AFMZ, and Fatta N. (2022). Bangla handwritten characters recognition using convolutional neural network. Aust. J. Eng. Innov. Technol., 4(2), 27-31. https://doi.org/10.34104/ajeit.022.027031