The healthcare field has benefited immensely from recent advances in technology, including everything from nanotechnology to deep neural networks. Deep neural networks are computer systems that are modeled on the real neural networks inside the human brain.
A form of AI, neural networks consist of a series of algorithms that mimic the way our brains work to find relationships and patterns in streams of data. They can then apply what they’ve learned to novel data sets and perform tasks.
Traditionally, these networks require an incredible amount of data to learn and make the same generalizations and predictions that humans can with only a handful of examples. Applications for neural networks will be limited if they continue to need to so much information to learn. For that reason, there has been a recent push toward few-shot learning, which challenges artificial neural networks to learn using fewer examples.
A more extreme version of this method is known as one-shot learning. With this model, networks could learn how to discern a category or class using only a single example. Some research has shown that this is feasible. However, one team at the University of Waterloo in Canada is pushing the envelope even further. Their focus is on so-called “less than one”-shot learning. You can download and read their paper on this topic in full here.
In this setting, neural networks are challenged to learn a certain number of classes when presented with fewer than one example per class. Such fast learning may seem impossible, but the team behind it has proved its feasibility both theoretically and empirically.
Humans are able to do less than one-shot learning. Imagine, for example, someone who does not know what a unicorn is. That person can see a picture of a horse and a rhinoceros and be told a unicorn is a mix of the two. Given two examples, the person learns three classes.
How Less than One-Shot Learning May Be Possible
The University of Waterloo team that has built a deep neural network capable of less than one-shot learning did so using soft labels to encode and decode information. This makes it possible to extract more data from each example.
With a hard label, a data point can belong to only a single class. A soft label allows a data point to belong to several classes simultaneously. With two hard labels, it is only possible to define two different classes. However, with two soft labels, three different classes are possible.
For example, imagine that you have the hard labels A and B. With these two labels, you only have two classes: A and B. If they are soft labels, it is possible for them to belong to more than one class. Suddenly, you have classes AA, BB, and AB. These two soft labels could also be represented as more than just pairs to create even more classes.
Making less than one-shot learning possible involves using probabilistic labels. These labels are soft labels that have elements forming a valid probability distribution. This approach is similar to using A and B combinations in more than just pairs. A valid probability distribution is a summation that can represent different classes.
By allowing neural networks to use soft labels and probability distributions, it becomes possible for them to learn several different classes, even from only two labels. By adding three labels, the possibilities become infinitely greater and neural networks can distinguish even between non-contiguous classes. In other words, introducing soft labels moves from a black-and-white world to one with grayscale where different shades can relate to different classes.
The Success of Less than One-Shot Learning Models
To test their theory, the University of Waterloo researchers created series of concentric circles where each is associated with a different class. The researchers wanted to create a deep learning network that would identify each circle as a different class.
They used a number of different hard-label prototype methods to achieve this goal and found that their performance was overall poor. Many of these methods resulted in networks that were not able to achieve separation between the two classes, which makes it difficult to compare them to prototypes that used soft labels.
However, the researchers were able to derive a card-label prototype configuration that was close to optimal in differentiating classes. To successfully distinguish between six concentric circles, 16 data points were necessary. However, when using soft labels, the network could distinguish the same number of concentric circles using only five data points. Moreover, the accuracy of the soft-label system was much greater at the boundaries of the circle.
The researchers note that it could be possible to reduce the required number of hard-label data points by carefully tuning the entire algorithm. However, it would never be possible to reduce this relationship to less than a linear one, as is possible with soft labels.
In short, the research demonstrates the feasibility of less than one-shot learning, which has before now been viewed as impossible. The implications of this discovery are intriguing.
Adopting this type of learning to deep neural networks could take machine learning to much greater heights and unlock new possibilities in the realm of healthcare delivery and decision-making—and many other fields besides. In addition, this technology makes it possible for neural networks to learn tasks when only a small amount of training data is available.