veda.ng
Back to Glossary

Contrastive Learning

Contrastive learning is a self-supervised training approach that learns representations by pulling similar items together and pushing dissimilar items apart in embedding space, without requiring labeled data. The core mechanism: given an anchor example, define positive examples (similar items that should be nearby) and negative examples (dissimilar items that should be distant). Train the model to produce embeddings where anchor-positive distance is small and anchor-negative distance is large. For images, positives might be augmented versions of the same image (cropped, rotated, color-shifted), while negatives are different images. The model learns to recognize invariances, features that persist across augmentations must be semantically meaningful. SimCLR and MoCo pioneered this for computer vision, achieving representations that transfer well to downstream tasks without any labeled pretraining data. For multimodal learning, CLIP treats image-caption pairs as positives and unpaired items as negatives, learning to align image and text representations. The InfoNCE loss is the standard contrastive objective: maximize the log-probability that the model correctly identifies the positive among many negatives. The number and quality of negatives significantly impacts learning, more negatives generally help, especially hard negatives that are semantically similar but distinct. Contrastive learning has become fundamental to modern representation learning.

Related Terms