S3 (Simple Storage Service)
Created : March 25, 2020
provides secure, durable, highly-scalable object storage.
- manage data as objects
- opposed to file systems(data as files) or block storage(data as blocks)
- serverless storage in the cloud
- not suitable to install an OS or a database on. (files only)
- individual objects : "0" bytes ~ 5TB
- by single PUT request : maximum 5GB
- larger than 100MB : consider multipart uploads
- unlimited storage
- universal namespace : name must be unique globally (like having a domain name)
- receive HTTP 200 code if file upload was successful.
- can turn on MFA(Multi-Factor Authentication) to delete
- Objects contain data like files.
Object consists of
- Key : name of the object
- Value : the data itself made up of a sequence of bytes
- Version ID : the version of object (when versioning enabled)
- Metadata : additional information
- Subresources (Access Control Lists, Torrent)
Objects are stored in Buckets.
- maximum 100 buckets per account
- can increase bucket limit to max 1,000 buckets by submitting a service limit increase.
When create a bucket, choose its name and the Region to create it in.
- After creation, can't change its name or Region.
S3 - Storage Classes
*Order by higher price to lower price
- 99.99% availability, 99.999999999% (11 x 9s) durability
- stored across multiple devices in multiple facilities (at least 3 AZs)
- same price as standard (= more beneficial than standard)
- use machine learning
- optimize costs by automatically moving data to the most cost-effective access tier (without performance impact or operational overhead)
Standard-IA (Infrequently Accessed)
- accessed less frequently
- rapid access when needed
- lower fee than Standard(almost 50% less), charge a retrieval fee
- only exist in one AZ
- 99.5% availability
- cheaper than Standard-IA by 20% less, charge a retrieval fee
- data can be lost (suitable when data can be reproduced easily)
- for long-term backups & data archiving
- retrieval time : configurable - from 1min to 12hrs
Glacier Deep Archive
- when accessed once or twice a year
- retrieval time : within 12 hrs
S3 Lock Policies
S3 Object Lock
to store objects using WORM (Write Once, Read Many) model.
- prevent objects from being deleted or modified for a fixed amount of time or indefinitely.
- to meet regulatory requirements
- or to add an extra protection against object changes and deletion.
- can be on individual objects or applied across the bucket.
- users can't overwrite/delete an object version or alter its lock settings
- still can give permission to alter the retention settings or delete.
- a protected object version can't be overwritten or deleted by any user. (incl. root user)
- can't be changed during retention period.
Glacier Vault Lock
- easily deploy and enforce compliance controls for individual S3 Glacier vaults with a Vault Lock policy.
S3 - Performance
mybucketname/folder1/subfolder1/myfile.jpg → Prefix:
- better performance by using deeper level of prefixes.
S3 - Security
- default: all newly created buckets are private.
Access control is configured using
- ACL (Access Control Lists) : simple way of granting access
- Bucket Policies : use a policy to define complex rule access (in JSON format)
Access logs can be configured to create.
- Logging per request can be turned on a bucket.
- Log files are generated and saved in a different bucket. (even a bucket in a different AWS account if desired)
S3 - Encryption
- Encryption In Transit : via SSL/TLS
Encryption At Rest
- Client-Side Encryption: by user (after encrypted by user, and then upload objects to S3)
- Server-Side Encryption (SSE) : by Amazon
Types of SSE (Server-Side Encryption)
key = a way of encrypting & decrypting object
- S3-managed keys
- use AES-256 algorithm
- Envelope encryption via AWS KMS(Key Management Service)
- User manage keys
- Customer provided keys
- user gives own keys to Amazon that user manages and user can encrypt S3 objects
S3 - Data Consistency
Read after Write consistency
- PUTS for new objects
- = when upload a new S3 object, can read immediately after writing.
- PUTS to overwrite or DELETES to delete
- = when overwrite or delete an existing file, it takes time to replicate versions to AZs.
- if read immediately, S3 may return an old copy. Need to generally wait a few seconds before reading.
S3 - Versioning
Store ALL versions of an object (incl. all writes)
- even if delete an object, versions will still exist.
- Great backup tool
- Once enabled, versioning cannot be disabled, only suspended.
- Fully integrate with S3 Lifecycle rules
MFA delete : extra protection
- need to provide MFA code to delete objects
- Must turn on MFA delete only from AWS CLI
- Versioning must be turned on.
- Only the bucket owner logged in as Root user can delete objects.
S3 - Lifecycle Management
Automate the process of moving objects to different storage classes or deleting objects.
- e.g. After 7 days, move to Glacier & after 1 year, permanently delete.
- Manage an object's lifecycle by using a lifecycle rule.
Can be used together with versioning.
- applied to both current & previous versions.
- Automatically expire objects
- Clean up incomplete multipart uploads
S3 - Cross-Region Replication (CRR)
- When enabled, uploaded objects will be automatically replicated to another region(s).
- Versioning required (both source & destination buckets)
- Can have CRR replicate to another AWS account.
- Regions must be unique.
Files in an existing bucket are NOT replicated automatically.
- All subsequent updated files will be replicated automatically.
Delete markers are not replicated.
- Files in replicated bucket still exist even if deleting original files.
- Deleting each versions of original or delete marker doesn't affect replicated bucket.
S3 - Transfer Acceleration
- Fast & secure transfer of files over long distances between end users and S3 bucket.
- Instead of uploading to bucket, users use a distinct URL for an Edge Location nearby.
- As data arrives at the Edge Location, it's automatically routed to S3 bucket over a specially optimized network path. (= Amazon's backbone network)
- Utilizes CloudFront's distributed Edge Locations.
S3 - Presigned URLs
- Generate a URL for temporary access to an object to either upload or download object data.
- Commonly used to provide access to private objects. (expires soon)
- Use AWS CLI or SDK to generate a URL.
< 3 ways to share S3 buckets across accounts >
use Bucket Policies & IAM
- for across entire bucket
- programmatic access only (console X)
use Bucket ACLs & IAM
- ACL = Access Control Lists
- for individual objects
- programmatic access only (console X)
cross-account IAM roles
- programmatic & console access
- always use MFA, strong&complex password on root account
- paying account is for billing purposes only. Don't deploy resources into paying account.
- enable/disable AWS services using SCP(Service Control Policies) either on OU or on individual accounts.
- deliver data using a global network of edge locations
- requests for content are automatically routed to the nearest edge location
- so content is delivered with the best possible performance
< Edge Locations >
- not just READ only (can write to them too)
- objects are cached for the life of the TTL (Time To Live)
- invalidate cache = can clear cached objects (will be charged)
< CDN (Content Delivery Network) >
- a system of distributed servers or network
- deliver webpages/web content to a user
- geographic locations of user
- origin of webpage
- content delivery server
< Tips >
- Origin : origin of all the files that CDN will distribute.
- Distribution : the name given CDN which consists of a collection of edge locations.
- Web Distribution : typically used for Websites
- RTMP : used for media streaming
- Physically migrate petabyte-scale datasets into/out of AWS
- Use secure appliances (physical briefcase computer) to transfer large amounts of data
- E-Ink display (for shipping information) and tamper, weather proof
- 5x cheaper than high-speed internet
- Comes in 2 sizes : 50TB or 80TB
- Fast transfer speed : takes less than a week for 100TB transfer.
- Use multiple layers of security
- Data is encrypted end-to-end (256-bit encryption)
- For security, data transfers must be completed within 90 days of the Snowball being prepared.
- Can import/export from S3 bucket.
- Similar to Snowball, but with more storage and on-site compute capabilities.
- LCD display (shipping information and other functionality)
- Comes in 2 sizes : 100TB (83TB of usable space) or 100TB Clustered (45TB per node)
- As a temporary storage tier for large local datasets
- Like portable version of AWS
- Exabyte-scale data transfer service
- 45-foot long ruggedized shipping container, pulled by a semi-trailer truck.
- One size : 100PB
- AWS personnel will help to connect network to the Snowmobile,
- and when data transfer is complete, they'll drive it back to AWS to import into S3 or Glacier.
- physical or virtual appliance that can be used to cache S3 locally on site
- enable hybrid cloud storage between on-premises environments and AWS Cloud.
provide low-latency performance by
- caching frequently accessed data on premises,
- storing data in cloud storage.
File Gateway (NFS & SMB)
- for flat files
- stored directly on S3
Volume Gateway (iSCSI)
- entire dataset is stored on site
- asynchronously backed up to S3
- entire dataset is stored on S3
- most frequently accessed data is cached on site
Tape Gateway (VTL - Virtual Tape Library)
- for backup (using popular backup applications)
Athena vs Macie
- interactive query service
- allow you to query data in S3
- use standard SQL
- used to analyse log data in S3
- security service
- use AI (Machine Learning & NLP)
- to discover/classify/protect sensitive data stored in S3
used to recognise if S3 objects contain sensitive data like PII
- PII = Personally Identifiable Information
- personal data used to establish an individual's identity