Semi-supervised learning uses both labeled and unlabeled data to build models. Fully supervised learning requires every example to have a correct label. Unsupervised learning uses no labels at all. Semi-supervised sits between the two, starting with a small labeled set and a much larger unlabeled pool, then combining information from both. This matters because labeling is expensive. In medical imaging, a radiologist might charge thousands of dollars to annotate a single scan. Semi-supervised methods can reach performance close to fully supervised systems while using only a fraction of the labeled data. The algorithm learns basic patterns from the labeled examples and then propagates that knowledge across the unlabeled data, typically by clustering similar inputs or generating pseudo-labels. This approach enables faster deployment in speech recognition, fraud detection, and autonomous driving. Companies can train on the massive raw sensor data they already collect, supplemented by a modest set of verified examples.
Back to Glossary