{"id":54716,"date":"2022-02-23T00:04:05","date_gmt":"2022-02-22T18:34:05","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=54716"},"modified":"2022-02-23T00:04:05","modified_gmt":"2022-02-22T18:34:05","slug":"oneshot-learning","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/oneshot-learning\/","title":{"rendered":"OneShot Learning"},"content":{"rendered":"<p><em><b>Written by <\/b><a href=\"http:\/\/linkedin.com\/in\/tushar-raj-verma-a72455191\"><b>Tushar Raj Verma<\/b><\/a> <strong>&amp;<\/strong><b>\u00a0<\/b><a href=\"https:\/\/www.linkedin.com\/in\/abhishek-kumar-upadhyay\/\"><b>Abhishek Kumar Upadhyay<\/b><\/a><\/em><\/p>\n<p><span style=\"font-weight: 400;\">Deep Convolutional Neural Networks have become the state of the art methods for image classification tasks. <\/span><span style=\"font-weight: 400;\">However, one of the biggest challenges is that they require a lot of labeled data to train the model. In many applications, collecting this much data is sometimes not feasible. One-Shot Learning aims to solve this problem. <\/span><span style=\"font-weight: 400;\">To understand <\/span><span style=\"font-weight: 400;\">One-Shot Learning<\/span><span style=\"font-weight: 400;\">, we first need to understand Siamese networks.<\/span><\/p>\n<p><b>Siamese Networks<\/b><\/p>\n<p><span style=\"font-weight: 400;\">This is a type of model network architecture that contains two or more identical subnetworks which are used to generate feature vectors(or embeddings) for each input and compare them. <\/span><span style=\"font-weight: 400;\">Siamese Networks can be applied in use cases, like face recognition, detecting duplicates, and finding anomalies. <\/span><span style=\"font-weight: 400;\">Our goal is for the model to learn to estimate the similarity between images. For this, we will provide three images to the model, where two of them will be similar (<\/span><i><span style=\"font-weight: 400;\">anchor<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">positive<\/span><\/i><span style=\"font-weight: 400;\"> samples), and the third will be unrelated (a <\/span><i><span style=\"font-weight: 400;\">negative<\/span><\/i><span style=\"font-weight: 400;\"> example). <\/span><span style=\"font-weight: 400;\">For the network to learn, we use a triplet loss function.<\/span><\/p>\n<p><b>Triplet Loss<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Its introduction was first introduced in the <\/span><a href=\"https:\/\/arxiv.org\/pdf\/1503.03832.pdf\"><span style=\"font-weight: 400;\">FaceNet paper<\/span><\/a><span style=\"font-weight: 400;\">. TripletLoss is a loss function that trains a neural network to closely embed features of the same class while maximizing the distance between embeddings of different classes. To do this an anchor is chosen along with one negative and one positive sample.<\/span><\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-54711 aligncenter\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-17-29-03.png\" alt=\"\" width=\"627\" height=\"195\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-17-29-03.png 627w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-17-29-03-300x93.png 300w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-17-29-03-624x194.png 624w\" sizes=\"(max-width: 627px) 100vw, 627px\" \/><em>Figure: The Triplet loss minimizes the distance between an anchor and a positive<\/em><\/p>\n<p><b>The loss function is described as a Euclidean distance function:<\/b><\/p>\n<blockquote><p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-54715 aligncenter\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-18-03-37.png\" alt=\"\" width=\"626\" height=\"50\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-18-03-37.png 626w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-18-03-37-300x24.png 300w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/Screenshot-from-2022-02-21-18-03-37-624x50.png 624w\" sizes=\"(max-width: 626px) 100vw, 626px\" \/><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Here, A is our anchor input, P is the positive sample input, N is the negative sample input, and \u201cmargin\u201d which you use to specify when a triplet has become too &#8220;easy&#8221; and you no longer want to adjust the weights from it.<\/span><\/p>\n<p><b>Semi-Hard Online Learning<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The best results are from the triplets known as \u201cSemi-Hard\u201d. These are defined as triplets where the negative is farther from the anchor than the positive but still produces a positive loss. To efficiently find these triplets you utilize online learning and only train from the Semi-Hard examples in each batch.<\/span><\/p>\n<p><b>WorkFlow<\/b><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-54712\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow.png\" alt=\"\" width=\"1406\" height=\"1650\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow.png 1406w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow-256x300.png 256w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow-873x1024.png 873w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow-768x901.png 768w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow-1309x1536.png 1309w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/oneshotflow-624x732.png 624w\" sizes=\"(max-width: 1406px) 100vw, 1406px\" \/><\/p>\n<p><strong>There are two phases in the flow of OneShot<\/strong><\/p>\n<p><strong>1. Training Phase<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Here we will train our embeddings model\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">This is how our embedding model architecture will be<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">This is the embedding model that will be trained as mentioned above in the flow<\/span><\/li>\n<li><span style=\"font-weight: 400;\">After training is completed we will save each digit\u2019s image embeddings called <\/span><b>template embeddings <\/b><span style=\"font-weight: 400;\">that will be used in test time<\/span><\/li>\n<\/ul>\n<p><strong>2. Testing Phase<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Here unseen data known as a test set is passed from the embedding model<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The unseen data after passing will now be converted to embeddings in n-dimensional<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The template embedding will be loaded and a custom generator is prepared<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The custom generator will generate pairs of embeddings of a test image and a loaded template model embedding<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The last step is to compare the pair embeddings<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">To measure the similarity of pair we use a cosine similarity matrix<\/span><\/li>\n<\/ul>\n<p><b>End-to-End examples for better understanding<\/b><\/p>\n<p><strong>Example: <\/strong>Here is a testing image of number 7 from MNIST data. Here we got 98% similarity as both of the images represents the same number 7.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-54713\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/correct.png\" alt=\"\" width=\"689\" height=\"1188\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/correct.png 689w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/correct-174x300.png 174w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/correct-594x1024.png 594w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/correct-624x1076.png 624w\" sizes=\"(max-width: 689px) 100vw, 689px\" \/><\/p>\n<p><strong>Example: <\/strong><span style=\"font-weight: 400;\">Here is a testing image of number 2 from MNIST data. Here we got 12% similarity as one image represents the same number 7 label and the test image represent number 2 label.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-54714\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/wrong_ons.png\" alt=\"\" width=\"689\" height=\"1188\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/02\/wrong_ons.png 689w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/wrong_ons-174x300.png 174w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/wrong_ons-594x1024.png 594w, \/blog\/wp-ttn-blog\/uploads\/2022\/02\/wrong_ons-624x1076.png 624w\" sizes=\"(max-width: 689px) 100vw, 689px\" \/><\/p>\n<p><b>Process of using OneShot learning models on unseen labels<\/b><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The best thing about one-shot learning is to use its powers on class or label that is not seen by the model<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A label that is unseen to the model can also be tested without taking into account the training for the particular model<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Here is the idea: we will not train the model on that particular label; instead we will use the trained model to compute template embeddings for the particular or unseen label<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The rest of the process will remain the same for that class during test time<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The test image of the unseen class will be compared to the saved embedding of that unseen class irrespective of training<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">The best example is attendance monitoring using the facial recognition of an institute. <\/span><span style=\"font-weight: 400;\">Just like many new students or staff joins the institute frequently, so should we retrain the model on each new face that comes to the institute to monitor the attendance. <\/span><span style=\"font-weight: 400;\">The answer is simply no we should not as we have the flexibility to save embeddings of a new face and during test time just compare it with that saved embeddings. <\/span><span style=\"font-weight: 400;\">This is just one use case for determining the role one shot in day-to-day life as many automated applications are also using it.<\/span><\/p>\n<p><b>Conclusion<\/b><\/p>\n<p><span style=\"font-weight: 400;\">In the case of <\/span><b>standard classification<\/b><span style=\"font-weight: 400;\">, for example, if we are trying to classify an image as cat or dog or horse or elephant, then for every input image, we generate 4 probabilities, indicating the probability of the image belonging to each of the 4 classes.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Now, during the training process, we require a <\/span><b>large<\/b><span style=\"font-weight: 400;\"> number of images for each of the classes (cats, dogs, horses, and elephants). Also, if the network is trained only on the above 4 classes of images, then we cannot expect to test it on any other class different from the given four. If we want our model to classify the images of a 5th class as well, then we need to first get a lot of images of that class and then we must <\/span><b>re-train<\/b><span style=\"font-weight: 400;\"> the model again. There are applications wherein we neither have enough data for each class and the total number of classes is huge as well as dynamically changing. Thus, the cost of data collection and periodical re-training is too high.<\/span><\/p>\n<div class=\"ap-custom-wrapper\"><\/div><!--ap-custom-wrapper-->","protected":false},"excerpt":{"rendered":"<p>Written by Tushar Raj Verma &amp;\u00a0Abhishek Kumar Upadhyay Deep Convolutional Neural Networks have become the state of the art methods for image classification tasks. However, one of the biggest challenges is that they require a lot of labeled data to train the model. In many applications, collecting this much data is sometimes not feasible. One-Shot [&hellip;]<\/p>\n","protected":false},"author":1433,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":23},"categories":[4831],"tags":[4932,4934,4935],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/54716"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1433"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=54716"}],"version-history":[{"count":2,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/54716\/revisions"}],"predecessor-version":[{"id":54722,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/54716\/revisions\/54722"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=54716"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=54716"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=54716"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}