Cost Anomaly Detection — AWS Cost Anomaly Detection, Azure Alerts, ML-Based Detection
Cloud cost anomalies — unexpected spikes in spending — can double your monthly bill before you notice. This guide covers native anomaly detection tools across AWS, Azure, and GCP, plus custom ML-based approaches and automated remediation workflows.
What You’ll Learn
You’ll configure AWS Cost Anomaly Detection, Azure budget alerts with anomaly evaluation, GCP cost insights, build custom anomaly detection with Python and ML, and implement automated remediation that pauses or scales down anomalous resources.
Why Cost Anomaly Detection Matters
A single misconfigured resource — an unattached GPU instance, a data transfer spike from a DDoS attack, a forgotten development cluster — can cost thousands of dollars per day. Manual cost monitoring doesn’t scale. Automated anomaly detection catches these issues within minutes, not days.
Learning Path
flowchart LR
A[Cloud Cost Basics] --> B[Cost Anomaly Detection<br/>You are here]
B --> C[Right-Sizing Strategies]
C --> D[FinOps Practices]
style B fill:#f90,color:#fff
AWS Cost Anomaly Detection
AWS’s native anomaly detection uses machine learning to establish spending baselines and detect deviations:
# Enable AWS Cost Anomaly Detection via CLI
aws ce provide-anomaly-feedback --anomaly-id "abc123" \
--feedback "YES" --comment "Confirmed spike from marketing campaign"
# List all monitors
aws ce get-anomaly-monitors
# Get anomalies for a date range
aws ce get-anomalies \
--date-interval Start=2026-06-01,End=2026-06-20 \
--monitor-arn "arn:aws:ce::MONITOR_ARN"Setting Up via Console
Go to AWS Cost Management → Cost Anomaly Detection
Create a monitor: Choose between:
- AWS services — Monitor total spend per service
- Linked accounts — Monitor spend per account
- Cost categories — Monitor spend per tag/category
- Custom — Combined view
Configure thresholds:
- Anomaly threshold: Dollar value or percentage (e.g., > $100 or > 50%)
- Evaluation frequency: Daily or hourly
Subscription: Set up SNS topic for email/Slack alerts
{
"MonitorArn": "arn:aws:ce::MONITOR_ARN",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE",
"MonitorSpec": {
"MonitorDimensionalGroupValues": ["AmazonEC2", "AmazonRDS"]
}
}Alert Example
AWS Cost Anomaly Alert — Service: AmazonEC2
Anomaly Score: 89/100
Estimated Impact: $2,345.67 over 7 days
Current Spend: $4,567.89 (normal: $2,222.22 +- $345.67)
Top Driver: ap-southeast-1 m5.24xlarge instance started 2026-06-18Azure Cost Alerts
Azure provides budget-based alerts with anomaly detection:
# Create a budget with alert
az consumption budget create \
--budget-name "prod-budget" \
--category cost \
--amount 50000 \
--time-grain monthly \
--start-date 2026-01-01 \
--end-date 2026-12-31 \
--notification-group \
threshold-type actual \
threshold-value 80 \
contact-groups "ops-team" \
enabled true
# List budgets
az consumption budget list
# Create action group for alerts
az monitor action-group create \
--name "CostAlerts" \
--resource-group "management" \
--action email ops@dodatech.com \
--action webhook https://hooks.slack.com/services/...Azure Anomaly Detection Configuration
{
"properties": {
"displayName": "Production Anomaly Monitor",
"timeGrain": "daily",
"notificationThresholds": [
{
"thresholdType": "forecasted",
"thresholdValue": 120,
"notificationType": "email"
}
],
"dimensions": [
{"name": "ServiceName", "values": ["virtualMachines", "storage"]}
]
}
}GCP Cost Insights
GCP provides built-in cost anomaly detection through Recommender:
# List cost insights
gcp recommender insights list \
--insight-type=google.cloud.billing.CostInsight \
--project=my-project \
--location=global
# Describe a specific insight
gcp recommender insights describe INSIGHT_ID \
--insight-type=google.cloud.billing.CostInsight \
--project=my-project \
--location=global
# Set up budget alerts
gcp billing budgets create \
--billing-account=BILLING_ACCOUNT_ID \
--display-name="Monthly Budget" \
--budget-amount=50000 \
--threshold-rules=spendBaseline=100,percent=0.5GCP Anomaly Insight Example
Insight: Unusual spike in Compute Engine costs
Category: cost
Severity: CRITICAL
Observation Period: 2026-06-13 to 2026-06-20
Observed Cost: $12,345 (baseline: $5,432)
Suggested Action: Review recently created VM instances in us-west1-bCustom ML-Based Detection
For multi-cloud or custom requirements, build your own anomaly detection:
Python Anomaly Detection with Prophet
import pandas as pd
import numpy as np
from prophet import Prophet
from datetime import datetime, timedelta
import boto3
import json
class CostAnomalyDetector:
def __init__(self, sensitivity=0.95):
self.sensitivity = sensitivity
self.model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
changepoint_prior_scale=0.05,
seasonality_prior_scale=10.0
)
def fetch_cost_data(self, days=90):
"""Fetch daily cost data from AWS Cost Explorer"""
client = boto3.client('ce', region_name='us-east-1')
end = datetime.now()
start = end - timedelta(days=days)
response = client.get_cost_and_usage(
TimePeriod={
'Start': start.strftime('%Y-%m-%d'),
'End': end.strftime('%Y-%m-%d')
},
Granularity='DAILY',
Metrics=['UnblendedCost'],
GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}]
)
records = []
for result in response['ResultsByTime']:
date = result['TimePeriod']['Start']
total = float(result['Total']['UnblendedCost']['Amount'])
records.append({'ds': date, 'y': total})
return pd.DataFrame(records)
def detect_anomalies(self, df):
"""Detect anomalies using Prophet forecast intervals"""
self.model.fit(df)
future = self.model.make_future_dataframe(periods=7)
forecast = self.model.predict(future)
# Merge actual with forecast
merged = df.merge(
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']],
on='ds', how='left'
)
# Flag anomalies
merged['anomaly'] = (
(merged['y'] > merged['yhat_upper']) |
(merged['y'] < merged['yhat_lower'])
)
merged['deviation'] = abs(
(merged['y'] - merged['yhat']) / merged['yhat'] * 100
)
anomalies = merged[merged['anomaly']].sort_values(
'deviation', ascending=False
)
return anomalies, forecast
def send_alert(self, anomaly):
"""Send anomaly alert via SNS"""
sns = boto3.client('sns')
message = json.dumps({
'type': 'cost_anomaly',
'date': str(anomaly['ds']),
'actual_cost': float(anomaly['y']),
'expected_cost': float(anomaly['yhat']),
'deviation_pct': float(anomaly['deviation']),
'severity': 'HIGH' if anomaly['deviation'] > 50 else 'MEDIUM'
})
sns.publish(
TopicArn='arn:aws:sns:us-east-1:ACCOUNT:cost-alerts',
Message=message,
Subject='Cost Anomaly Detected'
)
# Usage
detector = CostAnomalyDetector()
cost_data = detector.fetch_cost_data()
anomalies, forecast = detector.detect_anomalies(cost_data)
for _, anomaly in anomalies.head(5).iterrows():
print(f"{anomaly['ds']}: ${anomaly['y']:.2f} "
f"(expected ${anomaly['yhat']:.2f}, "
f"deviation: {anomaly['deviation']:.1f}%)")
detector.send_alert(anomaly)Expected output:
2026-06-18: $4567.89 (expected $2222.22, deviation: 105.6%)
2026-06-19: $3890.12 (expected $2345.67, deviation: 65.8%)
2026-06-15: $567.89 (expected $1234.56, deviation: 54.0%)Automated Remediation
AWS Lambda Auto-Remediation
import boto3
import json
ec2 = boto3.client('ec2')
def lambda_handler(event, context):
"""Auto-stop suspect instances when anomaly is severe"""
anomaly = json.loads(event['Records'][0]['Sns']['Message'])
# Only auto-remediate severe anomalies
if anomaly['severity'] != 'HIGH':
return {'status': 'monitoring_only'}
# Find recently launched expensive instances
instances = ec2.describe_instances(
Filters=[
{'Name': 'instance-state-name', 'Values': ['running']},
{'Name': 'tag:AutoStop', 'Values': ['true']}
]
)
stopped = []
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
launch_time = instance['LaunchTime']
# Stop instances launched in the last 24 hours
if (datetime.now() - launch_time).days < 1:
ec2.stop_instances(InstanceIds=[instance['InstanceId']])
stopped.append(instance['InstanceId'])
return {
'status': 'remediated',
'stopped_instances': stopped,
'anomaly_id': anomaly.get('anomaly_id')
}Common Anomaly Detection Mistakes
1. Setting Thresholds Too Low
A $10 anomaly alert triggers daily for normal cost fluctuations. Set thresholds based on historical variance — use 2-3 standard deviations from the mean.
2. Ignoring Seasonality
Cloud costs have natural patterns — higher during business hours, lower on weekends. Anomaly detection must account for daily, weekly, and monthly seasonality.
3. No Automated Remediation
Finding an anomaly is useless if no one acts on it. At minimum, send alerts to Slack/email. For critical anomalies, automate resource pausing or scaling.
4. Single-Cloud Monitoring Only
If you’re multi-cloud, you need a unified anomaly detection system. Use custom ML or a third-party tool (CloudHealth, Vantage) that aggregates across providers.
5. Not Tagging Resources
Anomaly detection is only as good as your data granularity. Untagged resources appear as “unknown” — you can’t determine the owner or purpose of the anomalous spend.
6. Alert Fatigue
Too many false alarms cause alert fatigue. Tune sensitivity, use evaluation periods (confirm anomaly persists for 30+ minutes), and implement severity levels.
7. Not Investigating Root Cause
Stopping an anomalous resource doesn’t prevent recurrence. Always investigate root cause: Was it a developer launching an expensive instance? A CI/CD pipeline with unlimited budget? A misconfigured auto-scaling policy?
Practice Questions
1. How does AWS Cost Anomaly Detection establish baselines? It uses ML to analyze 60+ days of historical spend, accounting for seasonality (daily, weekly, monthly patterns). It creates a prediction interval — spending outside this interval is flagged.
2. What’s the difference between actual and forecasted budget thresholds in Azure? Actual threshold alerts when spend reaches a percentage of the budget. Forecasted threshold alerts when the projected end-of-month spend reaches a percentage. Forecasted alerts catch overspend earlier.
3. Why use Prophet or similar ML models for anomaly detection? Prophet handles seasonality, trend changes, and missing data well. It provides uncertainty intervals essential for anomaly detection. It’s also robust to outliers that would confuse simpler statistical methods.
4. How do you prevent alert fatigue in cost anomaly detection? Use tiered severity (info/warning/critical), require sustained anomalies (e.g., 3 consecutive days), tune the sensitivity threshold, and exclude known cost events (planned launches, campaigns).
5. Challenge: Your organization has accounts in AWS, Azure, and GCP. Design an anomaly detection system that works across all three with centralized alerting and automated remediation. Answer: Use a Python service that pulls cost data from all three providers via their APIs daily. Feed into Prophet for baseline modeling. Send detected anomalies to a central SNS/Slack topic. For remediation, use provider-specific webhooks with common response playbooks (stop instances, reduce instance sizes, notify owners via tags).
Mini Project: Multi-Cloud Cost Monitor
Create a unified cost anomaly detector:
#!/usr/bin/env python3
# cost_monitor.py — Multi-cloud cost anomaly detection
# Requires: boto3, azure-mgmt-consumption, google-cloud-billing
import os
import json
from datetime import datetime, timedelta
class CloudCostMonitor:
"""Fetches cost data from AWS, Azure, and GCP"""
def fetch_aws_costs(self, days=30):
"""Fetch daily AWS costs"""
import boto3
client = boto3.client('ce', region_name='us-east-1')
end = datetime.now()
start = end - timedelta(days=days)
response = client.get_cost_and_usage(
TimePeriod={
'Start': start.strftime('%Y-%m-%d'),
'End': end.strftime('%Y-%m-%d')
},
Granularity='DAILY',
Metrics=['UnblendedCost'],
GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}]
)
return self._parse_ce_response(response)
def fetch_azure_costs(self, days=30):
"""Fetch daily Azure costs"""
from azure.mgmt.consumption import ConsumptionManagementClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = ConsumptionManagementClient(
credential, os.environ['AZURE_SUBSCRIPTION_ID']
)
scope = f"/subscriptions/{os.environ['AZURE_SUBSCRIPTION_ID']}"
end = datetime.now()
start = end - timedelta(days=days)
usage = client.usage_details.list(
scope,
filter=f"properties/usageStart ge '{start}'",
expand='properties/additionalProperties'
)
return self._parse_azure_usage(usage)
def fetch_gcp_costs(self, days=30):
"""Fetch daily GCP costs"""
from google.cloud import billing
client = billing.CloudBillingClient()
project = f"projects/{os.environ['GCP_PROJECT_ID']}"
response = client.get_project_billing_info(
request={"name": project}
)
return self._parse_gcp_billing(response)
def analyze_and_alert(self):
"""Analyze costs across all clouds and send alerts"""
results = {
'timestamp': datetime.now().isoformat(),
'anomalies': [],
'total_spend': {}
}
for provider, fetcher in [
('aws', self.fetch_aws_costs),
('azure', self.fetch_azure_costs),
('gcp', self.fetch_gcp_costs)
]:
try:
data = fetcher()
results['total_spend'][provider] = sum(
r['cost'] for r in data['daily_costs'][-7:]
)
# Simple threshold check
for day in data['daily_costs'][-3:]:
if day['cost'] > day['baseline'] * 1.5:
results['anomalies'].append({
'provider': provider,
'date': day['date'],
'cost': day['cost'],
'baseline': day['baseline'],
'pct_increase': (
(day['cost'] - day['baseline'])
/ day['baseline'] * 100
)
})
except Exception as e:
print(f"Error fetching {provider}: {e}")
return results
if __name__ == '__main__':
monitor = CloudCostMonitor()
report = monitor.analyze_and_alert()
print(json.dumps(report, indent=2))
if report['anomalies']:
print(f"\n⚠ {len(report['anomalies'])} anomalies detected!")
else:
print("\n✓ No anomalies detected.")FAQ
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro