veda.ng

SAM (Segment Anything Model) is a foundation model for image segmentation from Meta AI that can segment any object in any image given a point, box, or text prompt, generalizing across domains without task-specific training. SAM was trained on over 1 billion masks from 11 million images using a data engine that iteratively improved both the model and the dataset. This massive scale enables remarkable zero-shot generalization: SAM segments objects it has never seen in domains far from its training data. The architecture separates the heavy image encoder (run once per image) from the lightweight mask decoder (run per prompt), enabling interactive segmentation where users click to indicate what to segment and receive instant mask predictions. SAM supports multiple input prompt types: point prompts specify locations of interest; box prompts define regions; mask prompts provide coarse outlines to refine; and in SAM 2, text prompts describe what to segment. The 'Segment Anything' name reflects its goal of being the foundation model for segmentation, just as GPT-3 became a general-purpose language foundation, SAM aims to be a general-purpose segmentation foundation. Applications include annotation tools that greatly speed up labeling, editing tools for precise object selection, and as a component in larger vision systems. SAM 2 extends the approach to video, tracking segments across frames.