Zero Shot Learning

A Simple Explanation - By Varsha Saini

Humans are able to recognise objects that they have not seen before based on some description about them whereas a machine learning model is only able to predict the classes on which it has been trained. But what if a new class appears in the testing data?

For example, let us assume you have seen a horse and not a zebra. But you know that a zebra looks like a horse except it has black and white stripes on its body. Then you probably can recognize a zebra if you see it. This is the ability that zero-shot learning aims to provide.

What is Zero-Shot Learning in Machine Learning?

Zero-shot learning is a model ability to detect the classes that it has not seen while training. It is the ability to solve a task without any pre-training.

Data in zero-shot learning

In zero-shot learning, the data consists of seen classes, unseen classes and auxiliary information. Let’s consider we are training a model for image classification which has two classes dog and cat (images of dog and cat). But we have descriptions for other animals like cow, elephant, pig etc.

1. Seen Classes

Classes for which labelled images are present while training the model. In our case, dog and cat are seen classes.

2. Unseen Classes

Classes for which labelled images are not present while training the model. In our case, cow, elephant and pig images are unseen classes.

3. Auxiliary Information

Auxiliary information acts as a bridge between seen and unseen classes. It is the information in the form of the class description or word embeddings for both seen and unseen classes.

Types of Zero-Shot Learning

The zero-shot learning can be categorized based on the data available during training or testing time. Inductive and transductive zero-shot learning is based on training data whereas conventional and generalized zero-shot learning is based on testing data.

1. Inductive Zero-Shot Learning

While training the model, the labelled image is present for seen data and auxiliary information (semantic information) is present for both seen and unseen data.

In this type of zero-shot learning, the model tries to transfer knowledge from semantic description to visual image. In this way, the model will be able to recognise images from unseen classes at the time of testing.

2. Transductive Zero-Shot Learning

While training the model, the labelled image is present for seen data and auxiliary information (semantic information) is present for both seen and unseen data. In addition, unlabelled unseen images are also available.

3. Conventional Zero-Shot Learning

While testing the model, the images to be recognized only belongs to the unseen classes. This type of zero-shot learning is a bit impractical in real cases.

4. Generalized Zero-Shot Learning

While testing the model, the images to be recognized can belong to both seen and unseen classes. Generalized zero-shot learning is challenging as the model gets biased towards seen data as it has only learned seen classes while training. But it is more close to a realistic use case.