Semi-supervised learning uses both labeled and unlabeled data to build models. Fully supervised learning requires every example to have a correct label. Unsupervised learning uses no labels at all. Semi-supervised sits between the two, starting with a small labeled set and a much larger unlabeled pool, then combining information from both. This matters because labeling is expensive.
In medical imaging, a radiologist might charge thousands of dollars to annotate a single scan. Semi-supervised methods can reach performance close to fully supervised systems while using only a fraction of the labeled data.
The algorithm learns basic patterns from the labeled examples and then propagates that knowledge across the unlabeled data, typically by clustering similar inputs or generating pseudo-labels. This approach enables faster deployment in speech recognition, fraud detection, and autonomous driving.
Companies can train on the massive raw sensor data they already collect, supplemented by a modest set of verified examples.
Interactive Visualizer
Semi-Supervised Learning
Watch how a model leverages both labeled and unlabeled data to improve performance by generating pseudo-labels
Data Configuration
Training Progress
Data Visualization
Labeled Data
Expensive but high-quality
Unlabeled Data
Abundant but unlabeled
Pseudo-labeled
Model-generated labels
The model initially trains on labeled data, then uses high-confidence predictions to create pseudo-labels for unlabeled data, effectively expanding the training set and improving performance.