Knowledge Distillation - AI Glossary Term

« Back to Glossary Index

“Knowledge Distillation” is a technique in the field of machine learning that is used to transfer knowledge from a large, complex model (often called the teacher model) to a smaller, simpler model (often called the student model).

The main idea is to create a smaller model that performs as well as or nearly as well as the larger model but is more computationally efficient. This is particularly useful for deploying machine learning models to devices with limited computational resources, such as smartphones or embedded devices.

Here’s how it works in a simplified manner:

The large model is trained as usual on the available data, learning to predict the correct outputs for given inputs.
The smaller model is then trained not just to predict the correct outputs, but also to mimic the behavior of the large model. This is often done by having the smaller model try to match the output probabilities of the larger model, rather than just predicting the correct class. This way, the smaller model learns from the “experience” of the larger model, not just from the raw data.

The advantage of knowledge distillation is that the student model can often perform better than if it was trained directly on the raw data, because it has access to the “soft targets” (the output probabilities) from the teacher model, which contain more information than the “hard targets” (the correct classes).

To give you an analogy, consider a highly experienced doctor (teacher model) who is training a medical student (student model). The student not only learns from textbooks (raw data) but also learns by observing and imitating the experienced doctor’s procedures and decisions. The student can gain insights from the doctor’s experience, improving their own skills beyond what they could learn from textbooks alone. This is similar to what happens in knowledge distillation.

« Back to Glossary Index

AI Term:Knowledge Distillation