veda.ng
Back to Glossary

Optical Character Recognition

Optical Character Recognition infographic

OCR (Optical Character Recognition) is the technology that converts images of text, scanned documents, photos of signs, handwritten notes, into machine-readable text that can be searched, edited, and processed. Traditional OCR systems use image processing to isolate characters, then match them against known character patterns.

Modern OCR uses deep learning: CNNs extract visual features, while sequence models like LSTMs or transformers decode character sequences. This handles diverse fonts, degraded quality, and complex layouts far better than rule-based systems. Document OCR must handle multi-column layouts, tables, headers, footnotes, and mixed content.

Scene text OCR recognizes text in natural images: street signs, product labels, storefronts. This faces additional challenges: varied orientations, perspective distortion, occlusion, and decorative fonts. Handwriting recognition is particularly challenging due to individual variation in writing style.

The pipeline typically involves: text detection (finding where text appears), text recognition (reading what text says), and post-processing (language modeling to correct errors). Modern systems like TrOCR use vision transformers with text transformers, treating OCR as image-to-text translation.

Applications span document digitization, automated data entry, accessibility tools, receipt processing, license plate recognition, and augmented reality translation.