5.1 What is S3?

  • S3 stands for Simple Storage Service.
  • Those Storage when created are called S3 Bucket
  • It’s not Block storage
  • It’s a Global service

5.2 Amazon S3 Concepts

  1. Bucket
    • Buckets are containers for data stored in S3
    • Bucket name universal namespace, i.e the names must be unique globally.
      • Because bucket contains a DNS address
    • By default buckets are private and all the objects stored in a bucket are also private.
    • Eg
  2. Objects
    • Objects are the entities that are stored in an S3 bucket, Object-based storage
    • You can store the images, word files, pdf files, etc.
    • Size can be from 0 Bytes to 5 TB and has no limit on no of objects
  3. Key
    • A key is a unique identifier for an object.
    • Name + path of the object is used as key
    • Every object in a bucket is associated with one key.
    • Eg:
  4. Regions
    • Choose a geographical region in which you want to store the buckets that you have created.
    • A region is chosen in such a way that it optimizes the latency, minimizes costs, or addresses regulatory requirements.
  5. Data Consistency Model
    • Amazon S3 replicates the data to multiple servers to achieve high availability.

5.3 AWS S3 Setup

Create S3 bucket

  • Bucket name:
    • Name must be unique-globally
    • Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-)
    • Bucket names must begin and end with a letter or number
  • Select: Aws Region
  • Default: Block all public access
  • Default: Disable Bucket Versioning
  • Tag
  • Create

5.4 Object access

If the bucket is private

  • All objects will remain private, can’t change the permission
  • STS service can be used to make a specific object public for limited time

If the bucket is public

  • Then only the object can be made public

Making an AWS S3 Bucket/Object Public

    1. Make bucket public
    • MyBucket > Permission > Block public access
    • Uncheck Block all public access, save and confirm
    1. Make object public
    • MyBucket > Object > Permission > ACL > Edit
    • Everyone (public access) > Read
    • Tick checkbox and save
    1. By default make all objects public
    • All old objects and new added will be –> public automatically
    • MyBucket > Permission > Bucket policy > Edit
      • Paste the policy here

Generate bucket policies

5.5 S3 update meta-data

  • Select objects > Action > Edit metadata > Add metadata
  • Also you can add metadata while upload
  • Eg: System defined : Cache-Control : max-age=31536000

5.6 Versioning

  • Versioning is a means of keeping the multiple forms of an object in the same S3 bucket.
  • Versioning can be used to retrieve, preserve and restore every version of an object in the S3 bucket.

When working with S3 Versioning in Amazon S3 buckets, the Enable MFA Delete (multi-factor authentication) feature adds another layer of security to buckets. Read more…

  • MyBucket > Properties > Bucket Versioning > Edit
  • NOTE:
    • Once you enable the versioning of a bucket, then it cannot be disabled can only be suspend.
    • Only new object will be tracked
    • To enable encription, versioning should be enabled.

5.7 Event Notification

  • MyBucket > Properties > Event notifications > Edit
  • You can enable certain Amazon S3 bucket events to send a notification message to a destination whenever those events occur.
  • Event types
    • All objects create events
    • All object removal events etc..

5.8 Requester pays

  • MyBucket > Properties > Requester pays
  • In general, bucket owners pay for
    • All Amazon S3 storage and
    • Data transfer costs
  • When Requester pays is enabled
    • The requester pays for –> data transfer and the request
    • The bucket owner pays for –> data storage.
    • NOTE:
      • anonymous access to the bucket is disabled

5.9 Server access logging

  • MyBucket > Properties > Server access logging
  • Server access logging provides detailed records for the requests that are made to a bucket.
    • Logs on objects who upload, access, download, denied
  • NOTE:
    • The source and target buckets should be different
    • Else recursion may occur
      • And on every log update, another log will be created recursively
  • Check target bucket
    • MyBucket > Permissions > Access control list (ACL) >S3 log delivery group
    • Write, Read will be added

5.10 AWS Storage Classes

  • S3 Standard
    • Default storage class
    • Low latency and high throughput performance
    • Designed for 99.99% availability over a given year
  • S3 Intelligent-Tiering
  • S3 Standard IA
    • Used when data is accessed less frequently but requires rapid access when needed.
    • It has a lower fee than S3, but you will be charged a retrieval fee.
  • S3 One Zone-Infrequent Access
    • Same as S3 Standard IA, but with a single AWS Availability Zone
    • Data stored in this storage class will be lost in the event of Availability Zone destruction.
      • Due to this reason, its cost is 20% less than the Standard IA storage class.
      • It is a good choice for storing the backup data.
  • S3 Glacier Instant Retrieval
    • It is ideal for data that is accessed once or twice per quarter, and that requires immediate access.
  • S3 Glacier Flexible Retrieval (Formerly S3 Glacier)
    • For archive data that is accessed 1—2 times per year
  • S3 Glacier Deep Archive
    • Lowest cost storage class
    • Retrieval time within 12 hours

5.11 S3 Transfer Acceleration

  • Amazon S3 Transfer Acceleration is a bucket-level feature that enables
    • fast, easy, and secure transfers of files over long distances
    • between your client and an S3 bucket.

5.12 Replicating objects

  • Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets.
    • Can be of different accounts, AWS Regions or single/multiple destination buckets
  • Both source and destination buckets must have versioning enabled.
  • Bucket > Management > Replication rules
  • Types
    • Same-Region Replication (SRR)
    • Cross-Region Replication (CRR)
    • S3 Batch Replication

When to use Cross-Region Replication(CRR)

  • Compliance Requirements
    • By default, Amazon S3 stores the data across different geographical regions or availability zone to have the availability of data.
    • Sometimes there could be compliance requirements that you want to store the data in some specific region.
  • Minimize Latency
    • Suppose your customers are in two geographical regions.
    • To minimize latency, you need to maintain the copies of data in the AWS region that are geographically closer to your users.
  • Maintain object copies under different ownership

5.13 Querying S3

  • Query an be done using
    • S3 select
    • Amazon Athena
    • Amazon Redshift

S3 select

  • Enables to retrieve only a subset of data from an S3 object by using simple SQL expressions
  • S3 Select works on objects stored in CSV, JSON, or Apache Parquet format.
  • MyBucket > Select obj > Action > Query with S3 Select

Amazon Athena

  • You can use Athena to process unstructured, semi-structured, and structured data sets.

5.14 Lifecycle Management

  • MyBucket > Management > Lifecycle rules
  • Use lifecycle rules to define actions you want Amazon S3 to take during an object’s lifetime such as
    • transitioning objects to another storage class,
    • archiving them, or
    • deleting them after a specified period of time
  • Can be applied on all/limited objects in the bucket
  • Must have versioning enabled

The lifecycle defines two types of actions:

  • Transition actions
    • When you define the transition to another storage class.
    • For example
      • you choose to transit the objects to Standard IA storage class 30 days after you have created them
      • or archive the objects to the Glacier storage class 60 days after you have created them.
  • Expiration actions
    • You need to define when objects expire, the Amazon S3 deletes the expired object on your behalf.

5.15 CloudFront CDN

  • CloudFront CDN (Computer Delivery Network)
  • Securely deliver content with low latency and high transfer speeds
  • It is a system of distributed servers that deliver web pages and other web content to a user based on geographic locations.

Key Terminology of CloudFront CDN

  • Edge Location
    • Edge location is the location where the content will be cached.
  • Origin
    • It defines the origin of all the files that CDN will distribute.
  • Distribution
    • It is the name given to the CDN which consists of a collection of edge locations.
    • When we create a new CDN in a network with AWS means that we are creating a Distribution.

5.16 CORS

Reference