veda.ng
Back to Glossary

Named Entity Recognition

Named Entity Recognition (NER) identifies and classifies named entities in text, people, organizations, locations, dates, monetary values, and other specific items, enabling structured information extraction from unstructured text. Given a sentence like 'Apple released the iPhone in San Francisco on January 9, 2007', NER identifies 'Apple' as an organization, 'iPhone' as a product, 'San Francisco' as a location, and 'January 9, 2007' as a date. The task is framed as sequence labeling: each token receives a tag indicating whether it's part of an entity and what type. The BIO tagging scheme marks entity Beginnings, Inside continuations, and Outside tokens. Traditional approaches used conditional random fields (CRFs) over hand-crafted features. Modern NER uses transformer-based models fine-tuned for token classification, achieving high accuracy across standard entity types. Challenges include nested entities (organization names containing location names), ambiguity (Apple the company vs. apple the fruit), and domain-specific entities (drug names, gene names). Cross-lingual NER transfers learned patterns to languages with less training data. NER is a foundational component of information extraction pipelines, enabling knowledge graph construction, document summarization, search enhancement, and compliance monitoring.