veda.ng
Back to Glossary

Optical Character Recognition

OCR (Optical Character Recognition) is the technology that converts images of text, scanned documents, photos of signs, handwritten notes, into machine-readable text that can be searched, edited, and processed. Traditional OCR systems use image processing to isolate characters, then match them against known character patterns. Modern OCR uses deep learning: CNNs extract visual features, while sequence models like LSTMs or transformers decode character sequences. This handles diverse fonts, degraded quality, and complex layouts far better than rule-based systems. Document OCR must handle multi-column layouts, tables, headers, footnotes, and mixed content. Scene text OCR recognizes text in natural images: street signs, product labels, storefronts. This faces additional challenges: varied orientations, perspective distortion, occlusion, and decorative fonts. Handwriting recognition is particularly challenging due to individual variation in writing style. The pipeline typically involves: text detection (finding where text appears), text recognition (reading what text says), and post-processing (language modeling to correct errors). Modern systems like TrOCR use vision transformers with text transformers, treating OCR as image-to-text translation. Applications span document digitization, automated data entry, accessibility tools, receipt processing, license plate recognition, and augmented reality translation.