veda.ng
Back to Glossary

Object Detection

Object detection is a computer vision task that identifies and localizes objects within images by outputting both class labels and bounding box coordinates for each detected object. Unlike image classification which assigns a single label to an entire image, object detection handles multiple objects at various locations and scales. The output is a list of detections, each containing a class label, a bounding box (x, y, width, height), and a confidence score. Two-stage detectors like Faster R-CNN first propose candidate regions likely to contain objects, then classify and refine each proposal. This is accurate but slow. Single-stage detectors like YOLO (You Only Look Once) and SSD predict boxes and classes directly from the image in one pass, sacrificing some accuracy for real-time speed. DETR (Detection Transformer) reframes detection as set prediction using transformers, eliminating hand-designed components like non-maximum suppression. Modern object detection enables autonomous vehicles (detecting pedestrians, vehicles, signs), security systems (identifying people and objects), industrial inspection (finding defects), medical imaging (locating tumors), and retail (tracking inventory). The COCO dataset is the standard benchmark, evaluating precision and recall across 80 object categories. Real-world deployment requires balancing accuracy, speed, and robustness to varying conditions.