04-System Design

#System-Design #LLD-HLD

# Topic covered
* Memory & Storage Systems
  * RAM, ROM, HDD, SSD
* Databases Types
* Database replication
  * Synchronous replication
  * Asynchronous replication
  * Single-Master Replication
  * Multi-Master Replication
* RAID - Redundant Array of Independent Disks

Memory & Storage Systems

Random access memory (RAM)

RAM the primary internal memory of the CPU.
Volatile memory
Types - SRAM, DRAM

SRAM

Static random access memory (SRAM)
Stores data as long as there is power in the system
It’s more expensive than DRAM
Used for - cache memory

DRAM

Dynamic random access memory (DRAM)
Slower and cheaper than SRAM
It refreshes much more frequently
Types
- SDRAM - Synchronous Dynamic Random Access Memory
- DDR SDRAM - Double Data Rate Synchronous Dynamic Random-Access Memory
  - Types: DDR4, DDR5

Memory & Storage Systems

Read-only memory (ROM)

ROM is non-volatile and stores data permanently
Usage: Firmware, BIOS

Hard Disk Drive(HDD)

A hard disk drive is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. It is a non-volatile storage device, meaning it retains stored data even when powered off.

HDDs are typically installed internally in a computer and are attached directly to the disk controller of the computer’s motherboard.

Solid State Drive(SSD)

Solid-State Drive is a new generation of storage device used in computers. It replaces traditional mechanical hard disks with flash-based memory, which is significantly faster.

SSDs are more resistant to physical shock, run silently, and have higher input/output rates compared to hard disk drives. They store data in cells, and the number of bits stored in each cell determines their properties.

Other

USB/Flash/Pen drive
SD cards

Databases

SNC: DBMS and Databases: /categories/database/

SQL or Relational DB

Fixed Schema - in table from
Follow - ACID properties
- Atomicity, Consistency, Isolation, Durability
Easy vertical scaling, difficult horizontal scaling

NoSQL or Non-Relational or Document DB

No fixed Schema - key-value pair
Highly Scalable, Sharding
Dynamic data flexibility
Used for both - heavy read and write operations

Column DB

Mid-way of relational and document db
Used for - heavy writes operation
Do not follow - ACID properties
Eg: Cassandra

Search DB

Eg: Elasticsearch, Solar

There are many techniques to scale a relational database: master-slave replication, master-master replication, federation, sharding, de-normalization, and SQL tuning.

Database replication

Data Replication is the process of storing data in more than one site or node. It is useful in improving the availability of data.

It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency.

Advantage

Data replication allows for improved data backup
Handle fault tolerance i.e. users can access data stored at other nodes if current fails
Improves the performance for retrieval
Also improves the availability of data

Disadvantage

Replication adds more hardware and additional complexity.

Consistency Models or Consistency Algorithms

Synchronous replication

Aka - Read after write consistency
Master sent write issued and waits for ack from each replica

# Pros
* Replication lag is Zero
* Data is always consistence

# Cons
* Has performance hit
* Write fails - if a single replica ack is missed

Asynchronous replication

Master sent write issued and don't wait for ack from each replica

# Pros
* High read performance

# Cons
* May have some inconsistency

Types of data replication methods

Single-Master Replication

Aka Master-Slave replication.

The master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion.

If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.

Use cases

As Data warehouse - the slave node can be used for report generation
As CDN - place salve node close to end users

Disadvantage

Additional logic is needed to promote a slave to a master.

Multi-Master Replication

Master-Master replication

Both masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.

Disadvantage

Load balancer is needed to determine where to write.
Most master-master systems are either loosely consistent i.e. lazy and asynchronous,violating ACID)
It may have increased write latency due to synchronization.

RAID - Redundant Array of Independent Disks

It is a technique that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both.

RAID provides several benefits:

Increased data reliability: RAID provides redundancy, which means that if one disk fails, the data can be recovered from the remaining disks in the array. This makes RAID a reliable storage solution for critical data.

Improved performance: RAID can improve performance by spreading data across multiple disks.

RAID-0 (Stripping)

Data are split up into blocks that get written across all the drives in the array
Min 2 drive is required

When you save a file, RAID 0 breaks the data into segments called striped units. Then it spreads that data across all of the drives in your array. This is called striping, and it helps you access data faster because you have multiple drives working together to read, write, and store data.

# Pros
* Great performance

# Cons
* Not fault-tolerant
* No redundancy So, no backups

RAID-1 (Mirroring)

Data are stored twice by writing them to two data drive
Min 2 drive is required

Use case

RAID-1 is ideal for mission-critical storage, for instance for accounting systems.
It is also suitable for small servers in which only two data drives will be used.

# Pros
* Fault-tolerant --> Data protection

# Cons
* Storage capacity is only half
* Performance over-head writing in two drive

RAID10 or RAID 1+0

Hybrid RAID configuration – RAID 1 + RAID 0
Stripe the data and then mirror it
Min 4 drive is required

If you have at least four drives, RAID 10 will increase the speed that you would have with just one drive, and you get the advantages of having redundancies. However, this also means that you have to buy more drives, and you only get half the capacity of all of them.

RAID10 or RAID 1+0

# Pros
* High performance
* High fault tolerance
* The rebuild time is very fast 

# Cons
* Expensive, multiple disk usage
* Capacity is only half

RAID-5 (Block-Level Stripping with one Parity)

Striping with one parity
3 or more drives is required
If 4 drive is uses
- Data is divided into 3 blocks and stored in 3 disk
- Parity is stored in 4th disk

RAID-5 Block-Level Stripping with Distributed Parity

When you write data in this array, just like in RAID 0, your data is broken down into units and spread over the hard drives in your array. But, aside from striping the data, it also stores parity bits on the drives. parity bit is an additional binary digit that helps your array check if there’s any error or missing segments. These bits of data can also serve as redundancies.

# Pros
* Only 1 disk is used from redundancy
* Can handle max `1 disk failure`

# Cons
* This is complex technology

RAID-6 (Block-Level Stripping with two Parity Bits)

Striping with double parity
4 or more drives is required
If 4 drive is uses
- Data is divided into 2 blocks and stored in 2 disk
- Parity is stored in other 2 disk

RAID-6 Block-Level Stripping with two Parity Bits

# Pros
* Can handle max `2 disk failure`

# Cons
* Write data transactions are slower than RAID 5 due to the additional parity
* This is complex technology

What about RAID levels 2, 3, 4 and 7?
- These levels do exist but are not that common.

04-System Design

Memory & Storage Systems, Databases Types, Replication, RAID

04-System Design

Memory & Storage Systems, Databases Types, Replication, RAID

Memory & Storage Systems

Random access memory (RAM)

Read-only memory (ROM)

Hard Disk Drive(HDD)

Solid State Drive(SSD)

Other

Databases

Database replication

Advantage

Disadvantage

Consistency Models or Consistency Algorithms

Types of data replication methods

Single-Master Replication

Multi-Master Replication

RAID - Redundant Array of Independent Disks

RAID provides several benefits:

RAID-0 (Stripping)

RAID-1 (Mirroring)

RAID10 or RAID 1+0

RAID-5 (Block-Level Stripping with one Parity)

RAID-6 (Block-Level Stripping with two Parity Bits)