Amazon S3 Deep Dive — Storage Classes, Versioning, and Python SDK
Amazon S3 (Simple Storage Service) is a scalable object storage service that stores and retrieves any amount of data from anywhere, with 99.999999999% durability and industry-leading security and performance.
What You’ll Learn
By the end of this tutorial, you’ll understand S3 storage classes, versioning, lifecycle policies, presigned URLs, static website hosting, S3 Select, and how to perform common operations with the Python SDK (boto3).
Why Amazon S3 Matters
S3 is the foundation of data storage on AWS. Every other AWS service integrates with it — Lambda reads from it, Athena queries it, Redshift loads from it, CloudFront distributes it. It stores over 100 trillion objects for some customers. DodaTech uses S3 for Durga Antivirus Pro update distribution and Doda Browser analytics data lakes.
Amazon S3 Learning Path
flowchart LR
A[AWS Fundamentals] --> B[Amazon S3]
B --> C{You Are Here}
C --> D[Storage Classes]
C --> E[Versioning]
C --> F[Lifecycle]
C --> G[Security]
D --> H[Standard]
D --> I[Glacier]
G --> J[Presigned URLs]
G --> K[Bucket Policies]
What Is Amazon S3?
Think of S3 like an infinite filing cabinet. Every file (object) goes into a drawer (bucket). Each object has data (the file), metadata (labels), and a unique key (file path). You can access any object from anywhere via URL. Unlike a regular filing cabinet, this one never runs out of space, automatically backs up everything, and can serve files to millions of people simultaneously.
Storage Classes
S3 offers storage classes optimized for different access patterns.
| Class | Durability | Availability | Min Storage | Retrieval | Use Case |
|---|---|---|---|---|---|
| S3 Standard | 99.999999999% | 99.99% | None | Instant | Frequently accessed data |
| S3 Intelligent-Tiering | 99.999999999% | 99.9% | 30 days | Instant | Unknown access patterns |
| S3 Standard-IA | 99.999999999% | 99.9% | 30 days | Instant | Infrequent, accessed < 1/month |
| S3 One Zone-IA | 99.999999999% | 99.5% | 30 days | Instant | Recreatable data |
| S3 Glacier Instant | 99.999999999% | 99.9% | 90 days | Millisecond | Archive with instant access |
| S3 Glacier Flexible | 99.999999999% | 99.9% | 90 days | 1-5 min | Backup archives |
| S3 Glacier Deep Archive | 99.999999999% | 99.9% | 180 days | 12 hours | Long-term compliance |
S3 Python SDK Operations
# s3_operations.py
# Common S3 operations with boto3
import boto3
import json
from io import BytesIO
from datetime import datetime, timedelta
# Initialize client
s3 = boto3.client('s3', region_name='us-east-1')
def list_buckets():
"""List all S3 buckets."""
response = s3.list_buckets()
buckets = [b['Name'] for b in response['Buckets']]
print(f"Buckets: {buckets}")
return buckets
def upload_file(bucket, key, data):
"""Upload data to S3."""
s3.put_object(Bucket=bucket, Key=key, Body=data)
print(f"Uploaded: s3://{bucket}/{key}")
def download_file(bucket, key):
"""Download object from S3."""
response = s3.get_object(Bucket=bucket, Key=key)
return response['Body'].read().decode('utf-8')
def list_objects(bucket, prefix=''):
"""List objects in a bucket."""
response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
objects = response.get('Contents', [])
print(f"Objects in s3://{bucket}/{prefix}:")
for obj in objects:
size_kb = obj['Size'] / 1024
modified = obj['LastModified'].strftime('%Y-%m-%d %H:%M')
print(f" {obj['Key']:<40} {size_kb:>8.1f}KB {modified}")
return objects
def generate_presigned_url(bucket, key, expires_in=3600):
"""Generate a presigned URL for temporary access."""
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': bucket, 'Key': key},
ExpiresIn=expires_in,
)
print(f"Presigned URL (expires in {expires_in}s):")
print(f" {url[:80]}...")
return url
def set_bucket_versioning(bucket, status='Enabled'):
"""Enable or suspend bucket versioning."""
s3.put_bucket_versioning(
Bucket=bucket,
VersioningConfiguration={'Status': status},
)
print(f"Versioning {status.lower()} for {bucket}")
def get_bucket_lifecycle(bucket):
"""Get lifecycle rules for a bucket."""
try:
response = s3.get_bucket_lifecycle_configuration(Bucket=bucket)
rules = response.get('Rules', [])
print(f"Lifecycle rules for {bucket}:")
for rule in rules:
print(f" {rule['ID']}: {rule['Status']}")
except Exception as e:
print(f"No lifecycle config: {e}")
# Simulate operations without AWS
class S3Simulator:
def __init__(self):
self.buckets = {}
def create_bucket(self, name):
self.buckets[name] = {}
print(f"[S3] Created bucket: {name}")
def put_object(self, bucket, key, data):
if bucket not in self.buckets:
raise ValueError(f"Bucket {bucket} doesn't exist")
self.buckets[bucket][key] = {
"data": data,
"size": len(data),
"last_modified": datetime.now().isoformat(),
}
print(f"[S3] Put: s3://{bucket}/{key} ({len(data)} bytes)")
def get_object(self, bucket, key):
obj = self.buckets.get(bucket, {}).get(key)
return obj
def generate_url(self, bucket, key, expires=3600):
return f"https://{bucket}.s3.amazonaws.com/{key}?X-Amz-Expires={expires}"
# Demo
s3_sim = S3Simulator()
print("=== Amazon S3 Operations Demo ===\n")
s3_sim.create_bucket("dodatech-data-lake")
s3_sim.put_object("dodatech-data-lake", "logs/2026/06/15/app.log", "INFO: Server started\nERROR: Connection timeout\nINFO: Retry succeeded")
s3_sim.put_object("dodatech-data-lake", "data/sales_2026.csv", "order_id,amount,status\n1001,250.00,completed\n1002,45.50,pending")
obj = s3_sim.get_object("dodatech-data-lake", "data/sales_2026.csv")
print(f"\nRetrieved object: {obj['size']} bytes")
url = s3_sim.generate_url("dodatech-data-lake", "data/sales_2026.csv")
print(f"Presigned URL: {url}")
print("\n=== S3 Storage Classes Comparison ===")
classes = {
"S3 Standard": {"cost_per_gb": 0.023, "retrieval": "Instant", "min_days": 0},
"S3 Standard-IA": {"cost_per_gb": 0.0125, "retrieval": "Instant", "min_days": 30},
"S3 Glacier Instant": {"cost_per_gb": 0.004, "retrieval": "Instant", "min_days": 90},
"S3 Glacier Deep Archive": {"cost_per_gb": 0.00099, "retrieval": "12 hours", "min_days": 180},
}
for name, info in classes.items():
print(f" {name:<25} ${info['cost_per_gb']:<8} {info['retrieval']:<15} {info['min_days']}d min")Expected output:
=== Amazon S3 Operations Demo ===
[S3] Created bucket: dodatech-data-lake
[S3] Put: s3://dodatech-data-lake/logs/2026/06/15/app.log (69 bytes)
[S3] Put: s3://dodatech-data-lake/data/sales_2026.csv (67 bytes)
Retrieved object: 67 bytes
Presigned URL: https://dodatech-data-lake.s3.amazonaws.com/data/sales_2026.csv?X-Amz-Expires=3600
=== S3 Storage Classes Comparison ===
S3 Standard $0.023 Instant 0d min
S3 Standard-IA $0.0125 Instant 30d min
S3 Glacier Instant $0.004 Instant 90d min
S3 Glacier Deep Archive $0.00099 12 hours 180d minStatic Website Hosting
S3 can host static websites (HTML, CSS, JS) without any server.
# s3_static_site.py
# Configure S3 for static website hosting
def configure_static_site(bucket_name):
"""Configure an S3 bucket as a static website."""
s3 = boto3.client('s3')
# Enable static website hosting
s3.put_bucket_website(
Bucket=bucket_name,
WebsiteConfiguration={
'IndexDocument': {'Suffix': 'index.html'},
'ErrorDocument': {'Key': '404.html'},
}
)
# Make bucket public (or use CloudFront + OAI)
bucket_policy = {
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": f"arn:aws:s3:::{bucket_name}/*"
}]
}
s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(bucket_policy))
print(f"Static site ready: http://{bucket_name}.s3-website-us-east-1.amazonaws.com")
# Simulate
print("=== S3 Static Website Hosting ===")
print("""
Files needed:
index.html → Homepage
404.html → Error page
style.css → Styles
script.js → Scripts
Steps:
1. Create bucket with same name as domain (or use subdomain)
2. Enable "Static website hosting" → set index/error docs
3. Set bucket policy for public read
4. Upload files
5. (Optional) Point Route 53 domain to S3 endpoint
""")S3 Select
S3 Select enables retrieving subsets of object data using SQL expressions — no need to download the entire object.
-- S3 Select SQL example
-- Query a CSV in S3 without downloading the whole file
SELECT s.order_id, s.amount
FROM s3object s
WHERE s.status = 'completed'
AND s.amount > 100
LIMIT 10Common Amazon S3 Mistakes
1. Public Buckets with Sensitive Data
Thousands of data breaches happen because S3 buckets are accidentally made public. Use Block Public Access settings at account level and audit with AWS Config.
2. Not Enabling Versioning
Without versioning, accidental deletes or overwrites are permanent. Enable versioning on all production buckets — the storage cost is minimal compared to data loss.
3. No Lifecycle Policies
Data accumulates indefinitely. Old logs, temporary exports, and staging files grow costs. Set lifecycle rules to transition objects to cheaper tiers and delete after retention periods.
4. Using S3 Like a File System
S3 is object storage, not a file system. Don’t rename files (copy + delete), don’t edit in place (re-upload), and don’t use it for high-IOPs random access (use EBS or EFS).
5. Ignoring Request Pricing
S3 charges per request (PUT, GET, LIST). For workloads with millions of small objects, request costs can exceed storage costs. Batch operations or aggregate objects.
Practice Questions
1. What is S3’s durability and what does it mean?
99.999999999% (11 nines) durability — statistically, if you store 10 million objects, you’d lose one object every 10,000 years. Objects are automatically replicated across multiple availability zones.
2. What are S3 storage classes and when would you use each?
Standard (frequent access), Standard-IA (infrequent), Glacier Instant (archive, ms retrieval), Glacier Flexible (backup, min retrieval), Glacier Deep Archive (compliance, 12h retrieval). Choose based on access frequency and retrieval time needs.
3. How does S3 versioning work?
When enabled, S3 keeps every version of an object (including deletes as delete markers). You can list, retrieve, and restore any previous version. It protects against accidental deletion and overwrites.
4. What are presigned URLs?
Temporary URLs that grant access to private S3 objects without requiring AWS credentials. Generated with an expiration time. Used for secure file sharing, uploads, and downloads.
5. Challenge: Design an S3 storage strategy for a video platform with hot content (accessed daily), warm content (accessed monthly), and archived content (compliance, accessed rarely). Cost-optimize the storage.
Standard for hot (7 days), transition to Standard-IA (30 days), transition to Glacier Instant (90 days), transition to Glacier Deep Archive (1 year) for compliance. Lifecycle rules automate all transitions. CloudFront for CDN distribution.
Mini Project: S3 Storage Cost Calculator
# s3_cost_calculator.py
# Estimate monthly S3 costs based on storage class and data volume
from tabulate import tabulate
class S3CostEstimator:
def __init__(self):
self.pricing = {
"S3 Standard": {"per_gb": 0.023, "per_1000_put": 0.005, "per_1000_get": 0.0004},
"S3 Standard-IA": {"per_gb": 0.0125, "per_1000_put": 0.01, "per_1000_get": 0.001},
"S3 Glacier Instant": {"per_gb": 0.004, "per_1000_put": 0.02, "per_1000_get": 0.01},
"S3 Glacier Deep Archive": {"per_gb": 0.00099, "per_1000_put": 0.05, "per_1000_get": 0.10},
}
def estimate(self, storage_class, data_gb, puts_per_month=0, gets_per_month=0):
pricing = self.pricing.get(storage_class)
if not pricing:
return None
storage_cost = data_gb * pricing["per_gb"]
put_cost = (puts_per_month / 1000) * pricing["per_1000_put"]
get_cost = (gets_per_month / 1000) * pricing["per_1000_get"]
total = storage_cost + put_cost + get_cost
return {
"class": storage_class,
"storage_gb": data_gb,
"storage_cost": round(storage_cost, 2),
"put_cost": round(put_cost, 2),
"get_cost": round(get_cost, 2),
"total": round(total, 2),
}
calc = S3CostEstimator()
print("=== S3 Monthly Cost Estimate ===")
scenarios = [
("S3 Standard", 500, 50000, 200000),
("S3 Standard-IA", 2000, 10000, 50000),
("S3 Glacier Instant", 10000, 1000, 5000),
]
for cls, gb, puts, gets in scenarios:
result = calc.estimate(cls, gb, puts, gets)
print(f"\n{result['class']} ({gb} GB, {puts} puts, {gets} gets):")
print(f" Storage: ${result['storage_cost']} | PUT: ${result['put_cost']} | GET: ${result['get_cost']} | Total: ${result['total']}/mo")Expected output:
=== S3 Monthly Cost Estimate ===
S3 Standard (500 GB, 50000 puts, 200000 gets):
Storage: $11.5 | PUT: $0.25 | GET: $0.08 | Total: $11.83/mo
S3 Standard-IA (2000 GB, 10000 puts, 50000 gets):
Storage: $25.0 | PUT: $0.1 | GET: $0.05 | Total: $25.15/mo
S3 Glacier Instant (10000 GB, 1000 puts, 5000 gets):
Storage: $40.0 | PUT: $0.05 | GET: $0.05 | Total: $40.1/moRelated Concepts
What’s Next
You now understand Amazon S3 deeply! Next, explore cloud security for securing S3 buckets, and learn about AWS Lambda for processing S3 events.
- Practice daily — Create an S3 bucket and upload files via the console
- Build a project — Host a static website on S3 with CloudFront distribution
- Explore related topics — Check out S3 replication for cross-region disaster recovery
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro