# Topic covered
* Memory & Storage Systems
* RAM, ROM, HDD, SSD
* Databases Types
* Database replication
* Synchronous replication
* Asynchronous replication
* Single-Master Replication
* Multi-Master Replication
* RAID - Redundant Array of Independent Disks
Memory & Storage Systems
Random access memory (RAM)
- RAM the primary internal memory of the CPU.
- Volatile memory
Types - SRAM, DRAM
SRAM
- Static random access memory (SRAM)
- Stores data as long as there is power in the system
- It’s more expensive than DRAM
- Used for -
cache memory
DRAM
- Dynamic random access memory (DRAM)
- Slower and cheaper than SRAM
- It refreshes much more frequently
- Types
SDRAM
- Synchronous Dynamic Random Access MemoryDDR SDRAM
- Double Data Rate Synchronous Dynamic Random-Access Memory- Types:
DDR4, DDR5
- Types:
Read-only memory (ROM)
- ROM is non-volatile and stores data permanently
- Usage:
Firmware, BIOS
Hard Disk Drive(HDD)
A hard disk drive is an electro-mechanical data storage device
that stores and retrieves digital data using magnetic storage
with one or more rigid rapidly rotating platters coated with magnetic material.
It is a non-volatile
storage device, meaning it retains stored data even when powered off.
HDDs are typically installed internally in a computer and are attached directly to the disk controller of the computer’s motherboard.
Solid State Drive(SSD)
Solid-State Drive is a new generation
of storage device used in computers.
It replaces traditional mechanical hard disks with flash-based memory
, which is significantly faster
.
SSDs are more resistant to physical shock, run silently
, and have higher input/output rates
compared to hard disk drives.
They store data in cells, and the number of bits stored in each cell determines their properties.
Other
- USB/Flash/Pen drive
- SD cards
Databases
SNC: DBMS and Databases: /categories/database/
SQL or Relational DB
- Fixed Schema - in
table from
- Follow -
ACID properties
- Atomicity, Consistency, Isolation, Durability
- Easy
vertical scaling
, difficult horizontal scaling
NoSQL or Non-Relational or Document DB
- No fixed Schema -
key-value pair
Highly Scalable
, Sharding- Dynamic data flexibility
- Used for both -
heavy read and write
operations
Column DB
Mid-way of relational and document db
- Used for -
heavy writes
operation - Do not follow - ACID properties
- Eg: Cassandra
Search DB
- Eg: Elasticsearch, Solar
There are many techniques to scale a relational database
:
master-slave replication, master-master replication, federation, sharding, de-normalization, and SQL tuning.
Database replication
Data Replication is the process of storing data in more than one site or node
.
It is useful in improving the availability of data.
It is simply copying data from a database from one server to another server so that all the users can share the same data without any inconsistency.
Advantage
- Data replication allows for improved
data backup
- Handle
fault tolerance
i.e. users can access data stored at other nodes if current fails - Improves the
performance
for retrieval - Also improves the
availability
of data
Disadvantage
- Replication adds
more hardware
andadditional complexity
.
Consistency Models or Consistency Algorithms
Synchronous replication
- Aka - Read after write consistency
- Master sent write issued and
waits for ack from each replica
# Pros
* Replication lag is Zero
* Data is always consistence
# Cons
* Has performance hit
* Write fails - if a single replica ack is missed
Asynchronous replication
- Master sent write issued and
don't wait for ack
from each replica
# Pros
* High read performance
# Cons
* May have some inconsistency
Types of data replication methods
- https://github.com/donnemartin/system-design-primer#master-slave-replication
- https://github.com/karanpratapsingh/system-design#database-replication
Single-Master Replication
Aka Master-Slave
replication.
The master serves reads and writes
, replicating writes to one or more slaves, which serve only reads
.
Slaves can also replicate to additional slaves in a tree-like fashion.
If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.
Use cases
- As
Data warehouse
- the slave node can be used for report generation - As
CDN
- place salve node close to end users
Disadvantage
- Additional logic is needed to promote a slave to a master.
Multi-Master Replication
Master-Master
replication
Both masters serve reads and writes
and coordinate with each other on writes.
If either master goes down, the system can continue to operate with both reads and writes.
Disadvantage
- Load balancer is needed to determine where to write.
- Most master-master systems are either loosely consistent i.e. lazy and asynchronous,
violating ACID
) - It may have
increased write latency
due to synchronization.
RAID - Redundant Array of Independent Disks
It is a technique that combines multiple physical disk drive
components into one or more logical units
for the purposes of data redundancy, performance
improvement, or both.
RAID provides several benefits:
Increased data reliability:
RAID provides redundancy, which means that if one disk fails, the data can be recovered
from the remaining disks in the array.
This makes RAID a reliable storage solution for critical data.
Improved performance: RAID can improve performance by spreading data across multiple disks
.
RAID-0 (Stripping)
Data are split
up into blocks that get written across all the drives in the array- Min 2 drive is required
When you save a file, RAID 0 breaks the data into segments
called striped units.
Then it spreads that data across all of the drives
in your array. This is called striping, and it helps you access data faster
because you have multiple drives working together to read, write, and store data.
# Pros
* Great performance
# Cons
* Not fault-tolerant
* No redundancy So, no backups
RAID-1 (Mirroring)
- Data are
stored twice
by writing them to two data drive - Min 2 drive is required
Use case
- RAID-1 is ideal for
mission-critical storage
, for instance for accounting systems. - It is also suitable for small servers in which only two data drives will be used.
# Pros
* Fault-tolerant --> Data protection
# Cons
* Storage capacity is only half
* Performance over-head writing in two drive
RAID10 or RAID 1+0
- Hybrid RAID configuration –
RAID 1 + RAID 0
Stripe the data and then mirror it
- Min 4 drive is required
If you have at least four drives, RAID 10 will increase the speed
that you would have with just one drive,
and you get the advantages of having redundancies.
However, this also means that you have to buy more drives, and you only get half the capacity of all of them.
# Pros
* High performance
* High fault tolerance
* The rebuild time is very fast
# Cons
* Expensive, multiple disk usage
* Capacity is only half
RAID-5 (Block-Level Stripping with one Parity)
Striping with one parity
- 3 or more drives is required
- If 4 drive is uses
- Data is divided into 3 blocks and stored in 3 disk
- Parity is stored in 4th disk
When you write data in this array, just like in RAID 0, your data is broken down into units and spread over
the hard drives in your array.
But, aside from striping the data, it also stores parity bits
on the drives.
parity bit is an additional binary digit that helps your array check if there’s any error or missing segments.
These bits of data can also serve as redundancies.
# Pros
* Only 1 disk is used from redundancy
* Can handle max `1 disk failure`
# Cons
* This is complex technology
RAID-6 (Block-Level Stripping with two Parity Bits)
Striping with double parity
- 4 or more drives is required
- If 4 drive is uses
- Data is divided into 2 blocks and stored in 2 disk
- Parity is stored in other 2 disk
# Pros
* Can handle max `2 disk failure`
# Cons
* Write data transactions are slower than RAID 5 due to the additional parity
* This is complex technology
- What about RAID levels 2, 3, 4 and 7?
- These levels do exist but are not that common.