# Topic covered
* Database partitioning
    * Vertical partitioning (aka Normalisation)
    * Horizontal partitioning (aka Database Sharding)
* Hashing
    * Consistent hashing

Database partitioning

Database partitioning is the backbone of modern DBMS(distributed database management systems). It is a process of dividing a large dataset into several small partitions placed on different machines.

In other words, it is a way of partitioning data like tables or index-organized tables into smaller pieces so that data can be easily accessed and managed.

https://www.enjoyalgorithms.com/blog/data-partitioning-system-design-concept

https://github.com/karanpratapsingh/system-design#sharding

Why database partitioning?

With the growth in services and user base, it becomes tricky for a single server or database to function efficiently. We may experience lower performance with the architecture of a single database server.

Here is some situation that could arise:

  • Database operations become slower.
  • Network bandwidth starts reaching the saturation level.
  • The database server starts running out of disk space at some point.

DB partitioning types

.

  • Vertical partitioning (aka Normalisation)
  • Horizontal partitioning (aka Database Sharding)

Vertical partitioning(aka Normalisation)

In vertical partitioning, we partition the data vertically based on columns. We divide the tables into relatively smaller tables with fewer elements, and each part is present in a separate partition. It is also referred to as Normalisation

Usage

An ideal scenario for this type of partition is when you don’t need all the information about the customer in your query.

Can handle the critical part of data differently from the not so crucial part of our data. For example, we can store sensitive data like passwords, salary information, etc., in a separate partition to restrict access or provide additional security controls.

Vertical partitioning

Horizontal partitioning (aka Database Sharding)

In horizontal partitioning we split the table data horizontally based on rows and have the same number of columns. Partision key is used to decide which row goes on which table.

In below table ID is used a partision key Horizontal partitioning

Advantages

Scalability: Proves to increase scalability by distributing the data across multiple partitions.

Query Performance: Improves the performance of the system. Instead of querying the whole database, now the system has to query only a smaller partition.

Reliability and accessibility If any shard fails, the majority of the system remains accessible.

Security: Helps improve the system’s security by storing sensitive and non-sensitive data in different partitions. This could provide better manageability and security to sensitive data.

Data Manageability: Divides tables and indexes into smaller and more manageable units.

Disadvantages

Complexity: Sharding increases the complexity of the system in general.

Joins across shards: Each shards runs on different server. Some cross-shard queries can be very expensive or just not possible.

Undo sharding Once sharding is done it’s very hard if not impossible on some system to undo sharding.

Rebalancing: If the data distribution is not uniform or there is a lot of load on a single shard, in such cases, we have to rebalance our shards so that the requests are as equally distributed among the shards as possible.

When to use sharding?

Here are some reasons why sharding might be the right choice:

  • Leveraging existing hardware instead of high-end machines.
  • Maintain data in distinct geographic regions.
  • Quickly scale by adding more shards.
  • Better performance as each machine is under less load.
  • When more concurrent connections are required.

.

Key or hash-based partitioning

.

This strategy divides the rows into different partitions based on a hashing algorithm rather than grouping database rows based on continuous indexes.

Hash value determines which database partition to use. So it should be calculated based on static key(like user_id), that does not change.

For example:

Suppose we have 4 database partitions, and each request contained an application id. So we can simply perform a modulo operation on the application id with 4 and take the remainder to determine the partition to place the data.

Cons:

Increase or decrease of db partition is an issue. The keys need to be remapped and migrated to a new partition, and the hash function will need to be changed.

Range based sharding

.

In Range partitioning, we partition the table in such a way that each partition contains rows within a given range defined by the partition key.

Eg:Use partitioning key from 01-Jan-2022 to 31-Jan-2022

Pros

Increase or decrease of db partition does not have any issue

Cons:

Hotspot - caused by uneven distribution of Data

Directory based sharding

Directory-based sharding utilizes a lookup table that keeps track of which shard holds what data. In other words, it specifies a one-to-one mapping of the data with the shard that it is stored in.

Consider the example below. A column(ID) from the original table is selected as shard key. Each shard key is then given a specific shard ID, which tells which shard has the data with its corresponding shard key. In this way, the rows of all the rows in the original table are divided into different shards.

Directory based sharding

Hashing

Hashing is a process of mapping data of arbitrary size (or keys) to fixed-size values (hash codes or hash values) using a hash function.

The primary purpose of hashing is to quickly locate a data record given its search key. Hash functions are designed to produce a relatively uniform distribution of hash codes for different inputs.

.

Consistent hashing

Consistent Hashing is a specific technique used in distributed computing to distribute data across multiple nodes while minimizing the impact of adding or removing nodes from the system.

It’s designed to address some of the challenges of traditional hashing when used in distributed systems.

https://www.toptal.com/big-data/consistent-hashing