5.1 What is S3?
- S3 stands for Simple Storage Service.
- Those Storage when created are called
S3 Bucket
- It’s
not Block storage
- It’s a
Global service
5.2 Amazon S3 Concepts
- Bucket
- Buckets are
containers for data
stored in S3 - Bucket name universal namespace, i.e the
names must be unique globally
.- Because bucket contains a
DNS address
- Because bucket contains a
- By default buckets are private and all the objects stored in a bucket are also private.
- Eg
- If the object named photos/tree.jpg is stored in the treeimage bucket,
- then it can be addressed by using the URL http://treeimage.s3.amazonaws.com/photos/tree.jpg.
- Buckets are
- Objects
- Objects are the entities that are stored in an S3 bucket,
Object-based storage
- You can store the images, word files, pdf files, etc.
- Size can be from
0 Bytes to 5 TB
and hasno limit on no of objects
- Objects are the entities that are stored in an S3 bucket,
- Key
- A key is a
unique identifier
for an object. - Name + path of the object is used as key
- Every object in a bucket is associated with one key.
- Eg:
- In the URL http://jtp.s3.amazonaws.com/2019-01-31/Amazons3.wsdl
jtp
is the bucket name, and the key is2019-01-31/Amazons3.wsdl
- A key is a
- Regions
- Choose a geographical region in which you want to store the buckets that you have created.
- A region is chosen in such a way that it optimizes the latency, minimizes costs, or addresses regulatory requirements.
- Data Consistency Model
- Amazon S3 replicates the data to multiple servers to achieve high availability.
5.3 AWS S3 Setup
Create S3 bucket
- Bucket name:
- Name must be
unique-globally
- Bucket names can consist only of
lowercase letters, numbers, dots (.), and hyphens (-)
- Bucket names must
begin and end with a letter or number
- Name must be
- Select: Aws Region
- Default: Block all public access
- Default: Disable Bucket Versioning
- Tag
- Create
5.4 Object access
If the bucket is private
- All objects will remain private, can’t change the permission
- STS service can be used to make a specific object public for limited time
If the bucket is public
- Then only the object can be made public
Making an AWS S3 Bucket/Object Public
-
- Make bucket public
MyBucket > Permission > Block public access
- Uncheck
Block all public access
, save and confirm
-
- Make object public
MyBucket > Object > Permission > ACL > Edit
- Everyone (public access) > Read
- Tick checkbox and save
-
- By default make all objects public
- All old objects and new added will be –> public automatically
MyBucket > Permission > Bucket policy > Edit
- Paste the policy here
Generate bucket policies
- Way-1: AWS Policy Generator
- If every obj added, make it public automatically
- Principal: *
- Action: GetObject
- ARN: arn:aws:s3:::Bucket-Name/*
- Way-2: Add a bucket policy
- Copy and paste the policy
5.5 S3 update meta-data
Select objects > Action > Edit metadata > Add metadata
- Also you can add metadata while upload
- Eg: System defined : Cache-Control : max-age=31536000
5.6 Versioning
- Versioning is a means of keeping the multiple forms of an object in the same S3 bucket.
- Versioning can be used to retrieve, preserve and restore every version of an object in the S3 bucket.
When working with S3 Versioning in Amazon S3 buckets, the Enable MFA Delete
(multi-factor authentication) feature adds another layer of security to buckets.
Read more…
MyBucket > Properties > Bucket Versioning > Edit
- NOTE:
- Once you enable the versioning of a bucket, then it
cannot be disabled
can only besuspend
. - Only new object will be tracked
- To enable encription, versioning should be enabled.
- Once you enable the versioning of a bucket, then it
5.7 Event Notification
MyBucket > Properties > Event notifications > Edit
- You can enable certain Amazon S3 bucket events to send a notification message to a destination whenever those events occur.
- Event types
- All objects create events
- All object removal events etc..
5.8 Requester pays
MyBucket > Properties > Requester pays
- In general, bucket owners pay for
- All Amazon S3 storage and
- Data transfer costs
- When Requester pays is enabled
- The requester pays for –> data transfer and the request
- The bucket owner pays for –> data storage.
NOTE:
- anonymous access to the bucket is disabled
5.9 Server access logging
MyBucket > Properties > Server access logging
- Server access logging provides detailed records for the requests that are made to a bucket.
- Logs on objects who
upload
,access
,download
,denied
- Logs on objects who
NOTE:
- The source and target buckets should be different
- Else recursion may occur
- And on every log update, another log will be created recursively
- Check target bucket
MyBucket > Permissions > Access control list (ACL) >S3 log delivery group
- Write, Read will be added
5.10 AWS Storage Classes
S3 Standard
- Default storage class
- Low latency and high throughput performance
- Designed for 99.99% availability over a given year
S3 Intelligent-Tiering
S3 Standard IA
- Used when data is accessed less frequently but requires rapid access when needed.
- It has a lower fee than S3, but you will be charged a retrieval fee.
S3 One Zone-Infrequent Access
- Same as S3 Standard IA, but with a single AWS Availability Zone
- Data stored in this storage class will be lost in the event of Availability Zone destruction.
- Due to this reason, its cost is 20% less than the Standard IA storage class.
- It is a good choice for storing the backup data.
S3 Glacier Instant Retrieval
- It is ideal for data that is accessed once or twice per quarter, and that requires immediate access.
S3 Glacier Flexible Retrieval (Formerly S3 Glacier)
- For archive data that is accessed 1—2 times per year
S3 Glacier Deep Archive
- Lowest cost storage class
- Retrieval time within 12 hours
5.11 S3 Transfer Acceleration
- Amazon S3 Transfer Acceleration is a bucket-level feature that enables
- fast, easy, and secure transfers of files over long distances
- between your client and an S3 bucket.
5.12 Replicating objects
- Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets.
- Can be of different accounts, AWS Regions or single/multiple destination buckets
- Both source and destination buckets
must have versioning enabled
. Bucket > Management > Replication rules
- Types
- Same-Region Replication (SRR)
- Cross-Region Replication (CRR)
- S3 Batch Replication
When to use Cross-Region Replication(CRR)
- Compliance Requirements
- By default, Amazon S3 stores the data across different geographical regions or availability zone to have the availability of data.
- Sometimes there could be compliance requirements that you want to store the data in some specific region.
- Minimize Latency
- Suppose your customers are in two geographical regions.
- To minimize latency, you need to maintain the copies of data in the AWS region that are geographically closer to your users.
- Maintain object copies under different ownership
5.13 Querying S3
- Query an be done using
- S3 select
- Amazon Athena
- Amazon Redshift
S3 select
- Enables to retrieve only a subset of data from an S3 object by using simple
SQL expressions
- S3 Select works on objects stored in
CSV
,JSON
, orApache Parquet
format. MyBucket > Select obj > Action > Query with S3 Select
Amazon Athena
- You can use Athena to process unstructured, semi-structured, and structured data sets.
5.14 Lifecycle Management
MyBucket > Management > Lifecycle rules
- Use lifecycle rules to define
actions you want
Amazon S3 to take during an object’s lifetime such as- transitioning objects to another storage class,
- archiving them, or
- deleting them after a specified period of time
- Can be applied on
all/limited
objects in the bucket - Must have versioning enabled
The lifecycle defines two types of actions:
- Transition actions
- When you define the transition to another storage class.
- For example
- you choose to transit the objects to Standard IA storage class 30 days after you have created them
- or archive the objects to the Glacier storage class 60 days after you have created them.
- Expiration actions
- You need to define when objects expire, the Amazon S3 deletes the expired object on your behalf.
5.15 CloudFront CDN
- CloudFront CDN (Computer Delivery Network)
- Securely deliver content with
low latency
andhigh transfer speeds
- It is a system of distributed servers that deliver web pages and other web content to a user based on geographic locations.
Key Terminology of CloudFront CDN
- Edge Location
- Edge location is the location where the content will be cached.
- Origin
- It defines the origin of all the files that CDN will distribute.
- Distribution
- It is the name given to the CDN which consists of a collection of edge locations.
- When we create a new CDN in a network with AWS means that we are creating a Distribution.