# Topic covered
* Performance
* Latency, Throughput, Bandwidth, Response Time
* Consistency, Availability, and Partition Tolerance (CAP)
* CAP Theorem
* Failure & Fault Tolerance
Performance vs scalability
Performance refers to the responsiveness and speed
of a system.
It measures how quickly a system can execute its intended function within given time constraints.
Scalability refers to the capability of a system to increase or decrease its performance
under an increased or decreased load.
A scalable system can handle growing demand and increasing load without a significant impact on performance.
Another way to look at performance vs scalability:
- If you have a
performance problem
, your system isslow for a single user
. - If you have a
scalability problem
, your system is fast for a single user butslow under heavy load
.
# Performance Metrics to evaluate a system
* Throughput
* Bandwidth
* Latency
* Response Time
Latency vs Response Time
Latency is the time it takes for data to pass from one point on a network to another,
i.e. time spend on the network
$$
Latency = Time Spend On The Network
$$
Response time refers to the total time
it takes for a system to respond
to a request, including the time spent on processing the request.
It includes both the latency and the processing time
$$ Response Time = Latency + Processing Time $$
Latency vs throughput
Throughput is number of actions perform per unit of time
.
Work done at unit amount of time.
$$
Throughput = \frac{Load}{Time Taken} = \frac{Work done}{Time Taken}
$$
Generally, you should aim for maximal throughput with acceptable latency.
Bandwidth vs Throughput
Bandwidth is the maximum data capacity
of a network, or how much data can potentially travel from one point to another in a given time
.
Performance Metrics of components
- Application
- API response time
- Throughput of API
- Error occurrence
- Database
- Time taken by various db queries
- Number of queries executed per unit time(throughput)
- Cache
- Latency of writing to cache
- No of cache eviction and invalidation
- Memory of cache instance
- Message Queues
- Rate of production and consumption
- Fraction of stale or unprocessed messages
- Workers
- Time taken for job completion
- Resource used in processing
Performance Management Tools –> New Relic, Datadog, SolarWinds
Consistency, Availability, and Partition Tolerance
https://github.com/karanpratapsingh/system-design#cap-theorem
Consistency
Every read receives the most recent write or an error
Consistency means that all clients see the same data at the same time
, no matter which node they connect to.
For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated across all the nodes
in the system before the write is deemed “successful”.
Availability
Every request receives a response
, without guarantee that it contains the most recent version of the information.
Availability in a distributed system ensures that the system remains operational 100% of the time
.
Partition tolerance
The system does not fail, regardless of if messages are dropped or delayed (or linkage failure) between nodes in a system
.
A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
CAP Theorem
https://www.educative.io/answers/what-is-the-cap-theorem
Consistency, Availability, and Partition Tolerance (CAP) is a concept in distributed computing that describes the trade-offs
between three desirable properties of a distributed system
The CAP theorem states that it is impossible for a distributed system to simultaneously provide all three guarantees
The CAP theorem (also called Brewer’s theorem
) states that a distributed database system can only guarantee two out of these three
characteristics: Consistency, Availability, and Partition Tolerance.
Consistency-Availability Tradeoff
We live in a physical world and can’t guarantee the stability of a network, so distributed databases must choose Partition Tolerance (P)
This implies a tradeoff between Consistency (C) and Availability (A)
.
A CA database delivers consistency and availability
across all nodes.
It can’t do this if there is a partition between any two nodes in the system, and therefore can’t deliver fault tolerance.
Example: PostgreSQL, MariaDB.
An AP database delivers availability and partition tolerance
at the expense of consistency.
When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others.
When the partition is resolved, the AP databases typically re-syncs the nodes to repair all inconsistencies in the system.
Example: Apache Cassandra, CouchDB.
Failure & Fault Tolerance
1. Understanding types of faults
2. Tolerating faults - continue operating without interruption
3. Making system fail-safe
Example 1
- Faults:
Out of memory
- Hardware not able to handle huge load - Tolerant: System
scaling
Example 2
- Faults:
Hardware Failure
- Tolerant: System
Replication
Example 3
- Fault: Bug in the code
- Tolerant: Friendly Message in FE
- Hardware fault tolerance –> Replication