Database Sharding

Database sharding horizontally partitions data across multiple database instances based on a shard key, enabling systems to scale beyond single-machine capacity by distributing load across independent servers. Each shard contains a subset of the total data: users with IDs 0-999,999 on shard 1, 1,000,000-1,999,999 on shard 2.

Queries targeting a single user route to one shard; queries spanning all users must hit every shard and aggregate results. The shard key choice is critical. Good shard keys distribute data and query load evenly across shards.

Poor choices create hotspots: sharding by country puts disproportionate load on the US shard; sharding by creation date puts recent data under heavy load while old shards sit idle. Changing shard keys after deployment requires expensive data migration. Cross-shard operations are challenging. Transactions spanning shards require distributed coordination protocols that are slow and complex.

Joins across shards require application-level implementation. Foreign key relationships across shards can't be enforced by the database. Resharding, adding or removing shards as data grows, requires careful orchestration to maintain availability during migration. Consistent hashing minimizes data movement during resharding. Despite complexity, sharding enables internet-scale systems.

Facebook, Google, and Twitter shard aggressively. Sharding decisions cascade through application architecture: data models, query patterns, and operational procedures must all accommodate shard-awareness. Modern distributed databases like CockroachDB and Vitess automate some sharding complexity.

Interactive Concept: database sharding

Explore how data is distributed across multiple database shards and how queries are routed based on shard keys

Query Interface

Single User QueryAggregate Query

User ID:

Database Shards

Shard 1

User IDs: 0 - 999,999

Sample Users:

User 1

User 500,000

User 999,999

Shard 2

User IDs: 1,000,000 - 1,999,999

Sample Users:

User 1,000,001

User 1,500,000

User 1,999,999

Shard 3

User IDs: 2,000,000 - 2,999,999

Sample Users:

User 2,000,001

User 2,500,000

User 2,999,999

Shard 4

User IDs: 3,000,000 - 3,999,999

Sample Users:

User 3,000,001

User 3,500,000

User 3,999,999

Key Sharding Concepts

Shard Key

User ID determines which shard contains the data

Single Shard Query

Fast queries that hit only one database

Cross-Shard Query

Slower queries that must aggregate from all shards

Related Terms

Microservices Consistency Hashing

Interactive Concept: database sharding