{"id":454,"date":"2021-12-31T00:05:00","date_gmt":"2021-12-30T23:05:00","guid":{"rendered":"https:\/\/raulbalanza.me\/?p=454"},"modified":"2024-01-18T23:18:51","modified_gmt":"2024-01-18T22:18:51","slug":"recognizing-handwritten-digits-efficiently","status":"publish","type":"post","link":"https:\/\/raulbalanza.me\/es\/2021\/12\/31\/recognizing-handwritten-digits-efficiently\/","title":{"rendered":"Recognizing handwritten digits efficiently"},"content":{"rendered":"<p><em><br>Este art\u00edculo fue escrito originalmente en Ingl\u00e9s. Sin embargo, una versi\u00f3n traducida autom\u00e1ticamente se encuentra&nbsp;<a href=\"https:\/\/raulbalanza-me.translate.goog\/2021\/12\/31\/recognizing-handwritten-digits-efficiently\/?_x_tr_sl=en&amp;_x_tr_tl=es&amp;_x_tr_hl=es&amp;_x_tr_pto=wapp\">aqu\u00ed<\/a>.<\/em><\/p>\n\n\n\n<p><em>The complete scientific paper with a detailed description of all the performed experiments and their solutions is available <a href=\"https:\/\/raulbalanza.me\/es\/pdf\/handwritten_digits_classification.pdf\/\" data-type=\"URL\" data-id=\"https:\/\/raulbalanza.me\/pdf\/handwritten_digits_classification.pdf\">aqu\u00ed<\/a>.<\/em><\/p>\n\n\n\n<p>In September of this year, I moved to Prague (Czech Republic \ud83c\udde8\ud83c\uddff) to spend five months studying at the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.cvut.cz\/en\" target=\"_blank\">Czech Technical University in Prague<\/a> (\u010cVUT) as part of an international exchange program during the last semester of my Bachelor&#8217;s degree.<\/p>\n\n\n\n<p>For the semestral work of one of the courses that I am taking, I have had the opportunity of doing some research to compare several <em>Artificial Intelligence<\/em> approaches for the same purpose: recognizing and classifying handwritten digits.<\/p>\n\n\n\n<p>The task of <strong>recognizing handwritten digits<\/strong> consists of observing an image of a written number and determining to which digit the image corresponds. This task, which is trivial for humans, has been deeply studied in the AI field, obtaining almost perfect results. However, the objective of this research was not (only) to find the most accurate approach to take: it was to find a balance between high accuracy and high &#8216;efficiency&#8217; (i.e., a model that required the least amount of computing resources).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The dataset<\/h2>\n\n\n\n<p><em><a href=\"http:\/\/yann.lecun.com\/exdb\/mnist\/\" target=\"_blank\" rel=\"noreferrer noopener\">MNIST<\/a><\/em>, a very well-known dataset of computer vision, was the employed dataset for all the experiments that were carried out. This dataset has served since its release as the basis for benchmarking classification algorithms. It consists of a training set of 60K examples and a test set of 10K examples.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" style=\"background-color: white;\" src=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-1024x424.png\" alt=\"\" class=\"wp-image-459\" width=\"375\" height=\"155\" srcset=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-1024x424.png 1024w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-300x124.png 300w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-768x318.png 768w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-1536x635.png 1536w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-16x7.png 16w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset-1200x496.png 1200w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/mnist-dataset.png 1709w\" sizes=\"(max-width: 375px) 85vw, 375px\" \/><figcaption>Sample digits from the MNIST dataset<\/figcaption><\/figure><\/div>\n\n\n\n<p>Although this dataset is effectively solved, it is still widely used for estimating the performance of some models, and deciding how to explore possible improvements. There also exist several variants of this dataset with different types of data, designed for similar purposes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Experiments<\/h2>\n\n\n\n<p>During the whole research, a total of four computational intelligence techniques were used, with at least three variations for each of them in order to optimize as far as possible the accuracy of each model. A brief description of each of them follows.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Convolutional Neural Networks<\/h5>\n\n\n\n<p>CNNs are Deep Learning models that assign importance to various aspects of the input samples and are able to differentiate them from each other. The preprocessing required using this approach is much lower when compared to other classification algorithms, and the architecture is analogous to that of the connectivity pattern of neurons in the human brain.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist-1024x344.png\" alt=\"\" class=\"wp-image-472\" width=\"433\" height=\"145\" srcset=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist-1024x344.png 1024w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist-300x101.png 300w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist-768x258.png 768w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist-16x5.png 16w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/cnn-mnist.png 1108w\" sizes=\"(max-width: 433px) 85vw, 433px\" \/><figcaption class=\"wp-element-caption\">Example scheme of a CNN designed for the MNIST dataset<\/figcaption><\/figure><\/div>\n\n\n<p>After designing a model and applying certain modifications to try to improve the final results, an accuracy of <strong>98.85%<\/strong> was reached during <span style=\"text-decoration: underline;\">training<\/span>; and <strong>99.31%<\/strong> during <span style=\"text-decoration: underline;\">testing<\/span>.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Zero-shot learning<\/h5>\n\n\n\n<p>ZSL is a problem setup in machine learning where, at test time, a learner observes samples from classes that were not observed during training, and needs to predict the class they belong to. These methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes distinguishing properties of objects.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-1024x558.png\" alt=\"\" class=\"wp-image-478\" width=\"439\" height=\"239\" srcset=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-1024x558.png 1024w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-300x163.png 300w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-768x418.png 768w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-16x9.png 16w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram-1200x654.png 1200w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/zsl-diagram.png 1300w\" sizes=\"(max-width: 439px) 85vw, 439px\" \/><figcaption class=\"wp-element-caption\">Example scheme of a ZSL model<\/figcaption><\/figure><\/div>\n\n\n<p>A model was designed in which some digits where provided in the training phase, and some only in the testing (inference) phase, an accuracy of <strong>72.42%<\/strong> was reached during <span style=\"text-decoration: underline;\">training<\/span>; and <strong>48.06%<\/strong> during <span style=\"text-decoration: underline;\">testing<\/span>.<\/p>\n\n\n\n<p>However, it must be taken into account that this is a predictive approach with a limited amount of information in training time, an aspect that makes the results still considerably good.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><em>k<\/em>-Nearest Neighbors classifier<\/h5>\n\n\n\n<p><em>k<\/em>-NN is a non-parametric classification method used for classification, among others. The input consists of the <em>k<\/em> closest training examples in a dataset, and the output is a class membership. An object is classified to the most common class among its nearest neighbors.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/knn-concept.jpg\" alt=\"\" class=\"wp-image-481\" width=\"319\" height=\"239\" srcset=\"https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/knn-concept.jpg 800w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/knn-concept-300x225.jpg 300w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/knn-concept-768x576.jpg 768w, https:\/\/raulbalanza.me\/wp-content\/uploads\/2021\/12\/knn-concept-16x12.jpg 16w\" sizes=\"(max-width: 319px) 85vw, 319px\" \/><figcaption class=\"wp-element-caption\">k-NN method classification example<\/figcaption><\/figure><\/div>\n\n\n<p>In this case, data was preprocessed using the <a rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Principal_component_analysis\" target=\"_blank\">PCA<\/a> dimensionality reduction technique (to remove noisy data in each sample), and then classified using the <em>k<\/em>-NN method, using different values for <em>k<\/em>. An accuracy of <strong>97.60%<\/strong> was reached during <span style=\"text-decoration: underline;\">training<\/span>; and <strong>97.16%<\/strong> during <span style=\"text-decoration: underline;\">testing<\/span>.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Probability distributions<\/h5>\n\n\n\n<p>Finally, the last tested approach was to assume that the data followed different probability distributions, and to try estimating its parameters. In total, three distributions were explored.<\/p>\n\n\n\n<p>Some smoothing techniques were applied on the data to avoid overfitting and, because each of these probability distributions are suitable for different kinds of data, it was required to preprocess the samples in some cases.<\/p>\n\n\n\n<p>The results obtained for each of the distributions were:<\/p>\n\n\n\n<ul>\n<li><strong>Bernoulli<\/strong> distribution: 83.47% training | 84.38% testing<\/li>\n\n\n\n<li><strong>Multinomial<\/strong> distribution: 82.15% training | 83.67% testing<\/li>\n\n\n\n<li><strong>Gaussian<\/strong> distribution: 95.73% training | 95.82% testing<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusions<\/h2>\n\n\n\n<p>After running all the experiments, several interesting results were obtained. As in can be seen, the digit recognition task still has many aspects to be explored in order to find the most optimal approach to take, finding a balance between accuracy and efficiency in terms of computing requirements.<\/p>\n\n\n\n<p>The numbers show that the <strong>CNN<\/strong> model is the best performing one, followed closely by the <strong><em>k<\/em>-NN<\/strong> approach. Considering the difficulty of training both approaches, and due to a very small difference in both results, the most <em>efficient<\/em> choice in this case would be to go for the second one.<\/p>\n\n\n\n<p>Moreover, with respect to the <strong>probability distributions<\/strong>, several promising results were obtained for the Gaussian case, with an accuracy close to that of the previous two best approaches. However, the samples do not fit as well into the Bernoulli and Multinomial distributions.<\/p>\n\n\n\n<p>Finally, the <strong>ZSL<\/strong> learning method obtained the worst results. Nevertheless, that model is, in fact, characterized for having the least amount of information input of all the considered approaches. It can be concluded that altough in MNIST the amount of classes is very well-defined, the results would be remarkable for an expandable class dataset with more dynamic information.<\/p>","protected":false},"excerpt":{"rendered":"<p>This article was originally written in English. However, an automatically translated version can be found&nbsp;here. The complete scientific paper with a detailed description of all the performed experiments and their solutions is available here. In September of this year, I moved to Prague (Czech Republic \ud83c\udde8\ud83c\uddff) to spend five months studying at the Czech Technical &hellip; <a href=\"https:\/\/raulbalanza.me\/es\/2021\/12\/31\/recognizing-handwritten-digits-efficiently\/\" class=\"more-link\">Continuar leyendo<span class=\"screen-reader-text\"> &#8220;Recognizing handwritten digits efficiently&#8221;<\/span><\/a><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/posts\/454"}],"collection":[{"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":36,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/posts\/454\/revisions"}],"predecessor-version":[{"id":579,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/posts\/454\/revisions\/579"}],"wp:attachment":[{"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/media?parent=454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/categories?post=454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/raulbalanza.me\/es\/wp-json\/wp\/v2\/tags?post=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}