04-Statistics
Inferential statistics
While descriptive statistics summarize the characteristics
of a data set,
inferential statistics help you come to conclusions and make predictions
based on your data.
Inferential statistics have two main uses:
making estimates
about populations (for example, the mean SAT score of all 11th graders in the US).testing hypotheses
to draw conclusions about populations (for example, the relationship between SAT scores and family income).
03-Statistics
Probability distributions
3.1 Probability Distribution Function
A probability distribution is a mathematical function that describes the probability of different possible outcomes for an experiment
.
Probability distributions are often depicted using graphs or probability tables.
Probability Distribution Function can be categorized into
- Probability Density Function(PDF)
- Probablity Mass Function(PMF)
- Cumulative Distribution Function(CDF)
02-Statistics
Covariance, Correlation, Symmetric Distribution, Histogram
2.1 Covariance
Covariance and Correlation are very helpful in understanding the relationship between two continuous variables
.
Covariance tells whether both variables vary in the same direction (positive covariance) or in the opposite direction (negative covariance).
Covariance(x, y)
$$ Cov(x, y) = \sum_{i=1}^{n} \frac{(x_i - \bar{x})(y_i - \bar{y})}{n-1} $$
01-Statistics
Descriptive stats, Inferential stats
1.1 Statistics
Statistics
is the science of collecting, organizing and analyzing the data.- Used for decision making process
Data
- facts or pieces of information
1.2 Types of Statistics
- Descriptive stats
- It consists of
organizing and summarizing
the data.
- It consists of
- Inferential stats
- It consists of using data you have measured to
form conclusion
,make predictions
. - By using sample-data make conclusion on population-data
- It consists of using data you have measured to
Data Science RoadMap
Data Science
TL;DR
- Data Acquisition
- Excel, Statistics,Probability, SQL
- Data Preparation
- Python, Pandas, Numpy, Matplotlib/Seaborn
- Exploratory data Analysis
- Linear Algebra, Pandas
- Data Modeling
- Calculus, Machine learning, TensorFlow
- Visualization
- PowerBI, Tabelu
- Deployment
- Heroku, AWS
Flask
Flask
Flask
Flask is a popular Python web framework
, meaning it is a third-party Python library used for developing web applications
.
Flask is a lightweight WSGI
web application framework. It is designed to make getting started quick and easy with the ability to scale up to complex applications.
Flask is based on the Werkzeug WSGI
toolkit and Jinja2
template engine
04-System Design
Memory & Storage Systems, Databases Types, Replication, RAID
# Topic covered
* Memory & Storage Systems
* RAM, ROM, HDD, SSD
* Databases Types
* Database replication
* Synchronous replication
* Asynchronous replication
* Single-Master Replication
* Multi-Master Replication
* RAID - Redundant Array of Independent Disks
[Read More]
05-System Design
Database partitioning, Hashing
# Topic covered
* Database partitioning
* Vertical partitioning (aka Normalisation)
* Horizontal partitioning (aka Database Sharding)
* Hashing
* Consistent hashing
[Read More]
03-System Design
Performance, CAP, CAP Theorem, Failure & Fault Tolerance
# Topic covered
* Performance
* Latency, Throughput, Bandwidth, Response Time
* Consistency, Availability, and Partition Tolerance (CAP)
* CAP Theorem
* Failure & Fault Tolerance
[Read More]