Skip to content
Amazon S3 Deep Dive — Storage Classes, Versioning, and Python SDK

Amazon S3 Deep Dive — Storage Classes, Versioning, and Python SDK

DodaTech Updated Jun 15, 2026 8 min read

Amazon S3 (Simple Storage Service) is a scalable object storage service that stores and retrieves any amount of data from anywhere, with 99.999999999% durability and industry-leading security and performance.

What You’ll Learn

By the end of this tutorial, you’ll understand S3 storage classes, versioning, lifecycle policies, presigned URLs, static website hosting, S3 Select, and how to perform common operations with the Python SDK (boto3).

Why Amazon S3 Matters

S3 is the foundation of data storage on AWS. Every other AWS service integrates with it — Lambda reads from it, Athena queries it, Redshift loads from it, CloudFront distributes it. It stores over 100 trillion objects for some customers. DodaTech uses S3 for Durga Antivirus Pro update distribution and Doda Browser analytics data lakes.

Amazon S3 Learning Path


flowchart LR
  A[AWS Fundamentals] --> B[Amazon S3]
  B --> C{You Are Here}
  C --> D[Storage Classes]
  C --> E[Versioning]
  C --> F[Lifecycle]
  C --> G[Security]
  D --> H[Standard]
  D --> I[Glacier]
  G --> J[Presigned URLs]
  G --> K[Bucket Policies]

Prerequisites: Python basics. Understanding of AWS and cloud storage concepts.

What Is Amazon S3?

Think of S3 like an infinite filing cabinet. Every file (object) goes into a drawer (bucket). Each object has data (the file), metadata (labels), and a unique key (file path). You can access any object from anywhere via URL. Unlike a regular filing cabinet, this one never runs out of space, automatically backs up everything, and can serve files to millions of people simultaneously.

Storage Classes

S3 offers storage classes optimized for different access patterns.

ClassDurabilityAvailabilityMin StorageRetrievalUse Case
S3 Standard99.999999999%99.99%NoneInstantFrequently accessed data
S3 Intelligent-Tiering99.999999999%99.9%30 daysInstantUnknown access patterns
S3 Standard-IA99.999999999%99.9%30 daysInstantInfrequent, accessed < 1/month
S3 One Zone-IA99.999999999%99.5%30 daysInstantRecreatable data
S3 Glacier Instant99.999999999%99.9%90 daysMillisecondArchive with instant access
S3 Glacier Flexible99.999999999%99.9%90 days1-5 minBackup archives
S3 Glacier Deep Archive99.999999999%99.9%180 days12 hoursLong-term compliance

S3 Python SDK Operations

# s3_operations.py
# Common S3 operations with boto3
import boto3
import json
from io import BytesIO
from datetime import datetime, timedelta

# Initialize client
s3 = boto3.client('s3', region_name='us-east-1')

def list_buckets():
    """List all S3 buckets."""
    response = s3.list_buckets()
    buckets = [b['Name'] for b in response['Buckets']]
    print(f"Buckets: {buckets}")
    return buckets

def upload_file(bucket, key, data):
    """Upload data to S3."""
    s3.put_object(Bucket=bucket, Key=key, Body=data)
    print(f"Uploaded: s3://{bucket}/{key}")

def download_file(bucket, key):
    """Download object from S3."""
    response = s3.get_object(Bucket=bucket, Key=key)
    return response['Body'].read().decode('utf-8')

def list_objects(bucket, prefix=''):
    """List objects in a bucket."""
    response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    objects = response.get('Contents', [])
    print(f"Objects in s3://{bucket}/{prefix}:")
    for obj in objects:
        size_kb = obj['Size'] / 1024
        modified = obj['LastModified'].strftime('%Y-%m-%d %H:%M')
        print(f"  {obj['Key']:<40} {size_kb:>8.1f}KB  {modified}")
    return objects

def generate_presigned_url(bucket, key, expires_in=3600):
    """Generate a presigned URL for temporary access."""
    url = s3.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': key},
        ExpiresIn=expires_in,
    )
    print(f"Presigned URL (expires in {expires_in}s):")
    print(f"  {url[:80]}...")
    return url

def set_bucket_versioning(bucket, status='Enabled'):
    """Enable or suspend bucket versioning."""
    s3.put_bucket_versioning(
        Bucket=bucket,
        VersioningConfiguration={'Status': status},
    )
    print(f"Versioning {status.lower()} for {bucket}")

def get_bucket_lifecycle(bucket):
    """Get lifecycle rules for a bucket."""
    try:
        response = s3.get_bucket_lifecycle_configuration(Bucket=bucket)
        rules = response.get('Rules', [])
        print(f"Lifecycle rules for {bucket}:")
        for rule in rules:
            print(f"  {rule['ID']}: {rule['Status']}")
    except Exception as e:
        print(f"No lifecycle config: {e}")

# Simulate operations without AWS
class S3Simulator:
    def __init__(self):
        self.buckets = {}

    def create_bucket(self, name):
        self.buckets[name] = {}
        print(f"[S3] Created bucket: {name}")

    def put_object(self, bucket, key, data):
        if bucket not in self.buckets:
            raise ValueError(f"Bucket {bucket} doesn't exist")
        self.buckets[bucket][key] = {
            "data": data,
            "size": len(data),
            "last_modified": datetime.now().isoformat(),
        }
        print(f"[S3] Put: s3://{bucket}/{key} ({len(data)} bytes)")

    def get_object(self, bucket, key):
        obj = self.buckets.get(bucket, {}).get(key)
        return obj

    def generate_url(self, bucket, key, expires=3600):
        return f"https://{bucket}.s3.amazonaws.com/{key}?X-Amz-Expires={expires}"

# Demo
s3_sim = S3Simulator()
print("=== Amazon S3 Operations Demo ===\n")

s3_sim.create_bucket("dodatech-data-lake")
s3_sim.put_object("dodatech-data-lake", "logs/2026/06/15/app.log", "INFO: Server started\nERROR: Connection timeout\nINFO: Retry succeeded")
s3_sim.put_object("dodatech-data-lake", "data/sales_2026.csv", "order_id,amount,status\n1001,250.00,completed\n1002,45.50,pending")

obj = s3_sim.get_object("dodatech-data-lake", "data/sales_2026.csv")
print(f"\nRetrieved object: {obj['size']} bytes")

url = s3_sim.generate_url("dodatech-data-lake", "data/sales_2026.csv")
print(f"Presigned URL: {url}")

print("\n=== S3 Storage Classes Comparison ===")
classes = {
    "S3 Standard": {"cost_per_gb": 0.023, "retrieval": "Instant", "min_days": 0},
    "S3 Standard-IA": {"cost_per_gb": 0.0125, "retrieval": "Instant", "min_days": 30},
    "S3 Glacier Instant": {"cost_per_gb": 0.004, "retrieval": "Instant", "min_days": 90},
    "S3 Glacier Deep Archive": {"cost_per_gb": 0.00099, "retrieval": "12 hours", "min_days": 180},
}
for name, info in classes.items():
    print(f"  {name:<25} ${info['cost_per_gb']:<8} {info['retrieval']:<15} {info['min_days']}d min")

Expected output:

=== Amazon S3 Operations Demo ===

[S3] Created bucket: dodatech-data-lake
[S3] Put: s3://dodatech-data-lake/logs/2026/06/15/app.log (69 bytes)
[S3] Put: s3://dodatech-data-lake/data/sales_2026.csv (67 bytes)

Retrieved object: 67 bytes
Presigned URL: https://dodatech-data-lake.s3.amazonaws.com/data/sales_2026.csv?X-Amz-Expires=3600

=== S3 Storage Classes Comparison ===
  S3 Standard               $0.023    Instant         0d min
  S3 Standard-IA            $0.0125   Instant         30d min
  S3 Glacier Instant        $0.004    Instant         90d min
  S3 Glacier Deep Archive   $0.00099  12 hours        180d min

Static Website Hosting

S3 can host static websites (HTML, CSS, JS) without any server.

# s3_static_site.py
# Configure S3 for static website hosting

def configure_static_site(bucket_name):
    """Configure an S3 bucket as a static website."""
    s3 = boto3.client('s3')

    # Enable static website hosting
    s3.put_bucket_website(
        Bucket=bucket_name,
        WebsiteConfiguration={
            'IndexDocument': {'Suffix': 'index.html'},
            'ErrorDocument': {'Key': '404.html'},
        }
    )

    # Make bucket public (or use CloudFront + OAI)
    bucket_policy = {
        "Version": "2012-10-17",
        "Statement": [{
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": f"arn:aws:s3:::{bucket_name}/*"
        }]
    }
    s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(bucket_policy))

    print(f"Static site ready: http://{bucket_name}.s3-website-us-east-1.amazonaws.com")

# Simulate
print("=== S3 Static Website Hosting ===")
print("""
Files needed:
  index.html   → Homepage
  404.html     → Error page
  style.css    → Styles
  script.js    → Scripts

Steps:
  1. Create bucket with same name as domain (or use subdomain)
  2. Enable "Static website hosting" → set index/error docs
  3. Set bucket policy for public read
  4. Upload files
  5. (Optional) Point Route 53 domain to S3 endpoint
""")

S3 Select

S3 Select enables retrieving subsets of object data using SQL expressions — no need to download the entire object.

-- S3 Select SQL example
-- Query a CSV in S3 without downloading the whole file
SELECT s.order_id, s.amount
FROM s3object s
WHERE s.status = 'completed'
  AND s.amount > 100
LIMIT 10

Common Amazon S3 Mistakes

1. Public Buckets with Sensitive Data

Thousands of data breaches happen because S3 buckets are accidentally made public. Use Block Public Access settings at account level and audit with AWS Config.

2. Not Enabling Versioning

Without versioning, accidental deletes or overwrites are permanent. Enable versioning on all production buckets — the storage cost is minimal compared to data loss.

3. No Lifecycle Policies

Data accumulates indefinitely. Old logs, temporary exports, and staging files grow costs. Set lifecycle rules to transition objects to cheaper tiers and delete after retention periods.

4. Using S3 Like a File System

S3 is object storage, not a file system. Don’t rename files (copy + delete), don’t edit in place (re-upload), and don’t use it for high-IOPs random access (use EBS or EFS).

5. Ignoring Request Pricing

S3 charges per request (PUT, GET, LIST). For workloads with millions of small objects, request costs can exceed storage costs. Batch operations or aggregate objects.

Practice Questions

1. What is S3’s durability and what does it mean?

99.999999999% (11 nines) durability — statistically, if you store 10 million objects, you’d lose one object every 10,000 years. Objects are automatically replicated across multiple availability zones.

2. What are S3 storage classes and when would you use each?

Standard (frequent access), Standard-IA (infrequent), Glacier Instant (archive, ms retrieval), Glacier Flexible (backup, min retrieval), Glacier Deep Archive (compliance, 12h retrieval). Choose based on access frequency and retrieval time needs.

3. How does S3 versioning work?

When enabled, S3 keeps every version of an object (including deletes as delete markers). You can list, retrieve, and restore any previous version. It protects against accidental deletion and overwrites.

4. What are presigned URLs?

Temporary URLs that grant access to private S3 objects without requiring AWS credentials. Generated with an expiration time. Used for secure file sharing, uploads, and downloads.

5. Challenge: Design an S3 storage strategy for a video platform with hot content (accessed daily), warm content (accessed monthly), and archived content (compliance, accessed rarely). Cost-optimize the storage.

Standard for hot (7 days), transition to Standard-IA (30 days), transition to Glacier Instant (90 days), transition to Glacier Deep Archive (1 year) for compliance. Lifecycle rules automate all transitions. CloudFront for CDN distribution.

Mini Project: S3 Storage Cost Calculator

# s3_cost_calculator.py
# Estimate monthly S3 costs based on storage class and data volume
from tabulate import tabulate

class S3CostEstimator:
    def __init__(self):
        self.pricing = {
            "S3 Standard": {"per_gb": 0.023, "per_1000_put": 0.005, "per_1000_get": 0.0004},
            "S3 Standard-IA": {"per_gb": 0.0125, "per_1000_put": 0.01, "per_1000_get": 0.001},
            "S3 Glacier Instant": {"per_gb": 0.004, "per_1000_put": 0.02, "per_1000_get": 0.01},
            "S3 Glacier Deep Archive": {"per_gb": 0.00099, "per_1000_put": 0.05, "per_1000_get": 0.10},
        }

    def estimate(self, storage_class, data_gb, puts_per_month=0, gets_per_month=0):
        pricing = self.pricing.get(storage_class)
        if not pricing:
            return None

        storage_cost = data_gb * pricing["per_gb"]
        put_cost = (puts_per_month / 1000) * pricing["per_1000_put"]
        get_cost = (gets_per_month / 1000) * pricing["per_1000_get"]
        total = storage_cost + put_cost + get_cost

        return {
            "class": storage_class,
            "storage_gb": data_gb,
            "storage_cost": round(storage_cost, 2),
            "put_cost": round(put_cost, 2),
            "get_cost": round(get_cost, 2),
            "total": round(total, 2),
        }

calc = S3CostEstimator()
print("=== S3 Monthly Cost Estimate ===")
scenarios = [
    ("S3 Standard", 500, 50000, 200000),
    ("S3 Standard-IA", 2000, 10000, 50000),
    ("S3 Glacier Instant", 10000, 1000, 5000),
]

for cls, gb, puts, gets in scenarios:
    result = calc.estimate(cls, gb, puts, gets)
    print(f"\n{result['class']} ({gb} GB, {puts} puts, {gets} gets):")
    print(f"  Storage: ${result['storage_cost']} | PUT: ${result['put_cost']} | GET: ${result['get_cost']} | Total: ${result['total']}/mo")

Expected output:

=== S3 Monthly Cost Estimate ===

S3 Standard (500 GB, 50000 puts, 200000 gets):
  Storage: $11.5 | PUT: $0.25 | GET: $0.08 | Total: $11.83/mo

S3 Standard-IA (2000 GB, 10000 puts, 50000 gets):
  Storage: $25.0 | PUT: $0.1 | GET: $0.05 | Total: $25.15/mo

S3 Glacier Instant (10000 GB, 1000 puts, 5000 gets):
  Storage: $40.0 | PUT: $0.05 | GET: $0.05 | Total: $40.1/mo

Related Concepts

What’s Next

You now understand Amazon S3 deeply! Next, explore cloud security for securing S3 buckets, and learn about AWS Lambda for processing S3 events.

  • Practice daily — Create an S3 bucket and upload files via the console
  • Build a project — Host a static website on S3 with CloudFront distribution
  • Explore related topics — Check out S3 replication for cross-region disaster recovery

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro