veda.ng
Back to Glossary

Image Segmentation

Image segmentation assigns a class label to every pixel in an image, providing dense, pixel-level understanding rather than image-level or box-level predictions. There are three main variants: semantic segmentation assigns class labels (road, car, person) without distinguishing between instances; instance segmentation identifies individual objects (car 1, car 2, car 3) with separate masks for each; panoptic segmentation combines both, providing class labels for stuff categories (sky, road) and instance-specific masks for thing categories (individual cars, people). The U-Net architecture, with its encoder-decoder structure and skip connections, became foundational for medical image segmentation. Mask R-CNN extends Faster R-CNN with a mask prediction branch for instance segmentation. Modern approaches like Segment Anything Model (SAM) learn general segmentation capabilities that transfer across domains. Applications span medical imaging (tumor boundaries, organ delineation), autonomous driving (understanding road scenes), satellite imagery (land use classification), photo editing (background removal, object selection), and augmented reality (real-time scene understanding). Evaluation uses metrics like mean Intersection over Union (mIoU) comparing predicted and ground truth masks. Fine-grained pixel-level annotation is expensive, driving research into weakly supervised and self-supervised segmentation methods.