Amazon DynamoDB Guide — NoSQL Database on AWS
Amazon DynamoDB is a fully managed NoSQL key-value and document database on AWS, designed for single-digit millisecond latency at any scale, with automatic scaling, built-in security, and pay-per-request pricing.
What You’ll Learn
By the end of this tutorial, you’ll understand DynamoDB’s table design with partition and sort keys, create and query secondary indexes, configure provisioned and on-demand capacity modes, use DAX for caching, process change streams, and estimate costs for your workloads.
Why DynamoDB Matters
DynamoDB powers thousands of high-traffic applications at AWS — Amazon.com, Lyft, Airbnb, Snapchat, and Samsung all rely on it for consistent performance at massive scale. DodaZIP uses DynamoDB for its user session store and file metadata lookups. Durga Antivirus Pro uses DynamoDB for its real-time threat detection lookup tables. Learning DynamoDB is essential for cloud-native application development on AWS.
DynamoDB Learning Path
flowchart LR
A[SQL Basics] --> B[MongoDB]
B --> C[DynamoDB]
C --> D[Redis]
D --> E[Elasticsearch]
E --> F[Database Design]
C --> G{You Are Here}
style G fill:#f90,color:#fff
What Is DynamoDB? (The “Why” First)
Think of DynamoDB as a database that scales without you thinking about it. With SQL databases, scaling means bigger servers, read replicas, sharding — all manual work. DynamoDB is fully managed: you create a table, and AWS automatically distributes data across partitions and scales throughput. If your traffic spikes 10x, DynamoDB handles it. The trade-off: you give up complex queries, JOINs, and transactions across multiple items. DynamoDB is for simple key-value lookups and queries at massive scale — the “database for everything else.”
DynamoDB vs Traditional SQL
| Feature | SQL (MySQL/PostgreSQL) | DynamoDB |
|---|---|---|
| Scaling | Manual (sharding, replicas) | Automatic (partitioning) |
| Consistency | Strong (ACID) | Eventually consistent (default) |
| Queries | Complex (JOINs, subqueries) | Simple (key-value, single table) |
| Schema | Fixed schema (ALTER TABLE) | Schema-less (only key required) |
| Management | Self-managed or RDS | Fully managed (zero ops) |
| Latency | Variable (depends on query) | Single-digit ms (predictable) |
Tables and Primary Keys
Every DynamoDB table has a primary key — either a simple partition key or a composite (partition + sort) key:
# Create a table with partition key only (simple key)
aws dynamodb create-table \
--table-name users \
--attribute-definitions AttributeName=user_id,AttributeType=S \
--key-schema AttributeName=user_id,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
# Response:
# {"TableDescription": {"TableName": "users", "TableStatus": "CREATING", ...}}
# Create a table with partition key + sort key (composite key)
aws dynamodb create-table \
--table-name orders \
--attribute-definitions \
AttributeName=customer_id,AttributeType=S \
AttributeName=order_date,AttributeType=S \
--key-schema \
AttributeName=customer_id,KeyType=HASH \
AttributeName=order_date,KeyType=RANGE \
--billing-mode PAY_PER_REQUESTKey Design:
- Partition key (HASH) — determines which partition stores the item. All items with the same partition key are stored together.
- Sort key (RANGE) — sorts items within a partition. Enables range queries (
BETWEEN,>,<,BEGINS_WITH).
CRUD Operations with AWS CLI
# Put an item (create or replace)
aws dynamodb put-item \
--table-name users \
--item '{
"user_id": {"S": "user_001"},
"name": {"S": "Alice Johnson"},
"email": {"S": "alice@example.com"},
"age": {"N": "28"},
"city": {"S": "New York"},
"premium": {"BOOL": true}
}'
# Get an item by primary key
aws dynamodb get-item \
--table-name users \
--key '{"user_id": {"S": "user_001"}}'
# Response:
# {"Item": {"user_id": {"S": "user_001"}, "name": {"S": "Alice Johnson"}, ...}}
# Update an item (partial update)
aws dynamodb update-item \
--table-name users \
--key '{"user_id": {"S": "user_001"}}' \
--update-expression "SET age = :new_age, city = :new_city" \
--expression-attribute-values '{
":new_age": {"N": "29"},
":new_city": {"S": "Boston"}
}'
# Delete an item
aws dynamodb delete-item \
--table-name users \
--key '{"user_id": {"S": "user_001"}}'
# Query items by partition key
aws dynamodb query \
--table-name orders \
--key-condition-expression "customer_id = :id" \
--expression-attribute-values '{":id": {"S": "customer_001"}}'
# Query with sort key range
aws dynamodb query \
--table-name orders \
--key-condition-expression "customer_id = :id AND order_date BETWEEN :start AND :end" \
--expression-attribute-values '{
":id": {"S": "customer_001"},
":start": {"S": "2026-06-01"},
":end": {"S": "2026-06-30"}
}'Secondary Indexes
DynamoDB supports two types of secondary indexes for querying non-key attributes:
# Create a Global Secondary Index (GSI) — own throughput, can be created anytime
aws dynamodb update-table \
--table-name orders \
--attribute-definitions AttributeName=status,AttributeType=S \
--global-secondary-index-updates '[
{
"Create": {
"IndexName": "status-index",
"KeySchema": [
{"AttributeName": "status", "KeyType": "HASH"}
],
"Projection": {"ProjectionType": "ALL"}
}
}
]'
# Create a Local Secondary Index (LSI) — shares throughput, must be created at table creation
# LSI uses the same partition key but a different sort key
aws dynamodb create-table \
--table-name orders \
--attribute-definitions \
AttributeName=customer_id,AttributeType=S \
AttributeName=order_date,AttributeType=S \
AttributeName=total,AttributeType=N \
--key-schema \
AttributeName=customer_id,KeyType=HASH \
AttributeName=order_date,KeyType=RANGE \
--local-secondary-indexes '[
{
"IndexName": "customer_total_index",
"KeySchema": [
{"AttributeName": "customer_id", "KeyType": "HASH"},
{"AttributeName": "total", "KeyType": "RANGE"}
],
"Projection": {"ProjectionType": "ALL"}
}
]' \
--billing-mode PAY_PER_REQUEST
# Query using GSI
aws dynamodb query \
--table-name orders \
--index-name status-index \
--key-condition-expression "status = :s" \
--expression-attribute-values '{":s": {"S": "PENDING"}}'GSI vs LSI
| Feature | GSI | LSI |
|---|---|---|
| Partition key | Any attribute | Same as table |
| Creation | Any time | Table creation only |
| Throughput | Separate (extra cost) | Shared with table |
| Consistency | Eventually consistent | Strong or eventual |
| Limit per table | 20 | 5 |
Read/Write Capacity Modes
# On-demand mode (pay per request, auto-scales)
# Best for unpredictable workloads
aws dynamodb create-table \
--table-name sessions \
--billing-mode PAY_PER_REQUEST \
...
# Provisioned mode (fixed throughput, cost-effective for steady loads)
aws dynamodb create-table \
--table-name logs \
--billing-mode PROVISIONED \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
...
# Auto-scaling with provisioned capacity
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id "table/logs" \
--scalable-dimension "dynamodb:table:ReadCapacityUnits" \
--min-capacity 5 \
--max-capacity 100
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id "table/logs" \
--scalable-dimension "dynamodb:table:ReadCapacityUnits" \
--policy-name "scale-read" \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBReadCapacityUtilization"
}
}'DAX — DynamoDB Accelerator
DAX is an in-memory cache that sits between your application and DynamoDB:
flowchart LR
App[Application] --> DAX[DynamoDB Accelerator]
DAX --> DDB[DynamoDB Table]
DAX --> Cache[In-Memory Cache]
App -.->|Direct Bypass| DDB
Cache -.->|Get Item| DDB
style DAX fill:#f90,color:#fff
DAX reduces read latency from single-digit milliseconds to microseconds by caching frequently accessed items:
import boto3
from amazon_dax import AmazonDaxClient
# Regular DynamoDB client (2-5ms latency)
regular_client = boto3.client('dynamodb', region_name='us-east-1')
# DAX client (microsecond latency)
dax_client = AmazonDaxClient(
session=boto3.Session(),
endpoints=['my-dax-cluster.dax-clusters.us-east-1.amazonaws.com:8111']
)
# Same API calls, much lower latency
response = dax_client.get_item(
TableName='sessions',
Key={'session_id': {'S': 'abc123'}}
)DynamoDB Streams
DynamoDB Streams capture changes to a table in near-real-time:
# Enable streams on a table
aws dynamodb update-table \
--table-name orders \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
# Describe stream
aws dynamodb describe-table --table-name orders
# Look for LatestStreamArn in output
# Read stream records (via AWS Lambda or manual)
aws dynamodbstreams get-shard-iterator \
--stream-arn "arn:aws:dynamodb:..." \
--shard-id "shardId-..." \
--shard-iterator-type TRIM_HORIZON
aws dynamodbstreams get-records \
--shard-iterator "..."Stream Use Cases
import boto3
import json
lambda_client = boto3.client('lambda')
# Process stream records with Lambda
def lambda_handler(event, context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
item = record['dynamodb']['NewImage']
print(f"New order: {item['order_id']['S']}")
elif record['eventName'] == 'MODIFY':
old = record['dynamodb']['OldImage']
new = record['dynamodb']['NewImage']
print(f"Order {new['order_id']['S']} changed from {old['status']['S']} to {new['status']['S']}")
elif record['eventName'] == 'REMOVE':
item = record['dynamodb']['OldImage']
print(f"Order deleted: {item['order_id']['S']}")
return {'statusCode': 200}This pattern powers Durga Antivirus Pro’s real-time threat alert pipeline — every new threat signature triggers downstream processing.
DynamoDB Pricing
| Component | On-Demand | Provisioned |
|---|---|---|
| Write | $1.25 per million | $0.00065 per WCU/hour |
| Read | $0.25 per million | $0.00013 per RCU/hour |
| Storage | $0.25 GB/month | $0.25 GB/month |
| Streams | Free (up to 24 hours retention) | Free |
| DAX | $0.12+/hour per node | Same |
| GSI | Same as table pricing | Same as table pricing |
Common DynamoDB Errors
1. ProvisionedThroughputExceededException
Your application is exceeding the provisioned read/write capacity. Fix: Use exponential backoff in your code, increase provisioned capacity, or switch to on-demand mode.
import time
def dynamodb_retry(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except client.exceptions.ProvisionedThroughputExceededException:
sleep_time = 2 ** attempt * 0.1
time.sleep(sleep_time)
raise Exception("Max retries exceeded")2. ResourceNotFoundException
Table or index doesn’t exist. Fix: Check the table name spelling. Use aws dynamodb list-tables to verify table names.
3. ConditionalCheckFailedException
A condition expression on a write operation was not met. Fix: Check the condition logic. Common pattern: condition_expression="attribute_not_exists(pk)" to prevent overwrites.
4. Hot Partition
A single partition key is receiving too many requests. Fix: Add randomness to the partition key (e.g., user_id + random_suffix), or use a write sharding pattern.
5. ItemCollectionSizeLimitExceededException
Sort key version of an item collection (partition key group) exceeds 10GB. Fix: Redesign the partition key to spread data across more partitions.
6. ValidationException: The provided key element does not match the schema
The attribute type in your request doesn’t match the table schema. Fix: Ensure attribute types (S, N, B) match the AttributeType specified when creating the table.
7. LimitExceededException
You’ve exceeded DynamoDB account-level limits (e.g., 256 tables per region, 20 GSIs per table). Fix: Request a limit increase via AWS Support Ticket.
Practice Questions
1. What is the difference between a partition key and a sort key in DynamoDB?
The partition key (HASH) determines which physical partition stores the item. The sort key (RANGE) orders items within a partition and supports range queries. Together they form the composite primary key.
2. What is the difference between GSI and LSI?
GSI can use any attribute as partition key, can be created anytime, has separate throughput, and is eventually consistent. LSI must use the table’s partition key with a different sort key, must be created at table creation, shares table throughput, and supports strong consistency.
3. What is DAX and when should you use it?
DAX (DynamoDB Accelerator) is an in-memory cache that reduces read latency from milliseconds to microseconds. Use it for read-heavy workloads, especially when you have repeated GetItem queries for the same items (sessions, product details, metadata).
4. Challenge: Write a Python script that queries orders by customer and returns the total spent.
import boto3
def get_customer_total_spent(table_name, customer_id):
client = boto3.client('dynamodb')
response = client.query(
TableName=table_name,
KeyConditionExpression='customer_id = :cid',
ExpressionAttributeValues={':cid': {'S': customer_id}}
)
total = sum(
float(item['total_amount']['N'])
for item in response['Items']
)
return total5. How does DynamoDB pricing differ from traditional SQL databases?
DynamoDB charges per read/write request unit (not per server). You either pay a fixed hourly rate (provisioned) or per-million requests (on-demand). There is no instance to manage. SQL databases charge per server hour, regardless of usage.
Real-World Task: Build a User Session Store
Design a DynamoDB table for managing user sessions — the same pattern Doda Browser uses:
# Create session table with TTL
aws dynamodb create-table \
--table-name sessions \
--attribute-definitions \
AttributeName=session_id,AttributeType=S \
AttributeName=user_id,AttributeType=S \
--key-schema \
AttributeName=session_id,KeyType=HASH \
--global-secondary-indexes '[
{
"IndexName": "user-sessions-index",
"KeySchema": [
{"AttributeName": "user_id", "KeyType": "HASH"},
{"AttributeName": "ttl", "KeyType": "RANGE"}
],
"Projection": {"ProjectionType": "ALL"}
}
]' \
--billing-mode PAY_PER_REQUEST
# Put a session with TTL (auto-delete after 24 hours)
aws dynamodb put-item \
--table-name sessions \
--item '{
"session_id": {"S": "sess_"b3e99e"},
"user_id": {"S": "user_001"},
"ip_address": {"S": "192.168.1.1"},
"user_agent": {"S": "Mozilla/5.0 Doda Browser"},
"login_time": {"S": "2026-06-07T10:00:00Z"},
"ttl": {"N": "1717765200"}
}'
# DynamoDB automatically deletes items when TTL is reachedFAQ
Try It Yourself
Using the AWS CLI or console, experiment with these table operations:
# Scan all items (expensive — avoid in production)
aws dynamodb scan --table-name users --limit 10
# Get item count (approximate)
aws dynamodb describe-table --table-name users \
--query 'Table.ItemCount'
# Export table to S3 (via DynamoDB to S3 export)
aws dynamodb export-table-to-point-in-time \
--table-arn "arn:aws:dynamodb:..." \
--s3-bucket my-exports \
--export-format DYNAMODB_JSON
# Monitor table metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ConsumedReadCapacityUnits \
--dimensions Name=TableName,Value=orders \
--start-time 2026-06-07T00:00:00Z \
--end-time 2026-06-08T00:00:00Z \
--period 300 \
--statistics Sum
# Enable TTL on a table
aws dynamodb update-time-to-live \
--table-name sessions \
--time-to-live-specification Enabled=true,AttributeName=ttlThese monitoring and management patterns help DodaZIP maintain cost-effective, high-performance DynamoDB tables and Durga Antivirus Pro scale its real-time threat detection across millions of active sessions.
What’s Next
Congratulations on completing this DynamoDB tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro