System Design: Database Partitioning and Sharding

📁 LLD-HLD 📁 HLD 🏷️ System-Design 🏷️ Sharding 🏷️ Partitioning

📝 Topics Covered

1. What is Database Partitioning?
2. Why Partition Databases?
3. Core Partitioning Methods
- 3.1 Vertical Partitioning (Normalization)
- 3.2 Horizontal Partitioning (Sharding)
4. Sharding Routing Strategies
5. Understanding Hashing
- 5.1 Traditional Hashing vs. Consistent Hashing
- 5.2 Consistent Hashing Mechanics

1. What is Database Partitioning?

Database partitioning is the structural backbone of modern DBMS (Distributed Database Management Systems). It is the process of splitting a single large logical dataset into smaller, independent partitions that can be distributed and managed across multiple distinct physical servers.

By breaking down massive indices and tables into compact units, systems can significantly accelerate read/write lookups and ensure easier administrative management.

[!NOTE] Review EnjoyAlgorithms’ Data Partitioning System Design Concept and Karan Pratap Singh’s System Design Sharding Primer for advanced conceptual overviews.

2. Why Partition Databases?

As an application’s user base and traffic grow, a single database node becomes a major performance bottleneck:

Slow Operations: Sequential disk seek times and index traversal latency grow exponentially as database tables reach millions of rows.
Network Saturation: The network interface card (NIC) of a single database host reaches its bandwidth limit.
Storage Exhaustion: A single hardware server eventually runs out of physical disk space.

3. Core Partitioning Methods

When partitioning data, database architects typically leverage one of two structural approaches:

3.1 Vertical Partitioning (Normalization)

Vertical partitioning involves splitting a table vertically based on columns. We group specific subsets of columns into smaller, dedicated tables, which can be placed on separate physical database servers.

Use Cases:
- Isolating frequently read columns from rarely accessed metadata to optimize disk I/O.
- Storing highly sensitive data (e.g. passwords, bank credentials, salary indices) in a separate partition to implement stronger security controls and strict access firewalls.

Vertical Partitioning Diagram

3.2 Horizontal Partitioning (Sharding)

Horizontal partitioning involves splitting a table horizontally based on rows. Each partitioned table (shard) retains the exact same schema and column layout but stores a distinct, disjoint subset of rows.

Partition Key: A critical column value (e.g., User_ID) used to determine which specific shard a given database row belongs to.

Horizontal Partitioning Diagram

3.3 Pros & Cons of Sharding

Pros:
- Exceptional Scalability: Enables scaling out database capacity indefinitely by adding more physical database servers (shards).
- Optimized Query Speed: Queries target a much smaller subset of rows within a specific shard, leading to extremely fast lookups.
- Fault Isolation: If a single shard goes offline, the rest of the database remains operational and accessible to most users.
- Administrative Manageability: Backup, restoration, and index rebuilding tasks are significantly faster on smaller, discrete partitions.
Cons:
- Architectural Complexity: Requires coordinating routing algorithms and managing distributed system nodes.
- Cross-Shard Joins: Querying data across different shards requires joining rows over the network, which is extremely expensive and slow.
- Irreversible Action: Undoing sharding or changing sharding keys is incredibly difficult once the data is distributed.
- Hotspots & Resharding: If traffic is unevenly distributed, a single shard can become a performance hotspot, requiring complex rebalancing.

4. Sharding Routing Strategies

To write and read data from a sharded cluster, the system must determine which shard contains the target row based on the Partition Key:

4.1 Key-Based or Hash-Based Partitioning

Uses a hashing function to map a static key (like User_ID) to a shard number.

$$\text{Shard ID} = \text{Hash}(\text{Key}) \pmod{\text{Total Shards}}$$

Example: With 4 shards and a User_ID of 105, if $\text{Hash}(105) = 1005$, then $1005 \pmod 4 = 1$. The data is routed to Shard 1.
Cons: Adding or removing a shard requires changing the modulo divisor, which invalidates existing mappings and requires migrating almost all data.

4.2 Range-Based Partitioning

Groups database rows into shards based on continuous ranges of a partition key value (e.g., dates, alphabetical order, or numerical IDs).

Example: Shard 1 stores data for January, Shard 2 stores February, and so on.
Pros: Simple to implement; adding new shards for future ranges is straightforward.
Cons: Highly susceptible to hotspots (e.g. all write traffic targeting the current month’s active shard while older shards remain idle).

4.3 Directory-Based Partitioning

Decouples the partitioning logic from the key values by using a centralized lookup directory that maps partition keys directly to physical Shard IDs.

Pros: Dynamic and flexible; data can be rebalanced or moved between shards without changing database keys or hashing functions.
Cons: The lookup directory becomes a single point of failure (SPOF) and a performance bottleneck if it is not heavily cached.

Directory Sharding Diagram

5. Understanding Hashing

Hashing is the process of mapping data of arbitrary size to fixed-size values using a hash function. In system design, hashing is primarily used to distribute traffic or locate records quickly.

5.1 Consistent Hashing

Consistent hashing is a specialized distributed hashing technique that maps both nodes (servers) and data keys to a circular structure known as a hash ring.

When a node is added or removed from a consistent hashing ring, only a fraction of the keys need to be remapped and migrated, minimizing the impact of topology changes on the system.

[!TIP] Read Toptal’s Consistent Hashing Deep Dive Guide to study concrete implementation examples and virtual node allocations.

6. References & Further Reading

Distributed Sharding: Database Partitioning and Sharding Strategies
Consistent Hashing: Consistent Hashing Ring Implementations
System Design Hub: Karan Pratap Singh System Design Sharding Primer