Is AWS Down? Complete Status Check Guide + Quick Fixes
EC2 instances not responding?
S3 buckets timing out?
Lambda functions failing?
Before panicking, verify if AWS is actually downβor if it's a configuration issue on your end. Here's your complete guide to checking AWS status and responding to outages.
Quick Check: Is AWS Actually Down?
Don't assume it's AWS. Many "AWS down" reports are actually configuration errors, quota limits, or region-specific issues that can be resolved quickly.
1. Check Official Sources
AWS Service Health Dashboard:
π health.aws.amazon.com/health/status
What to look for:
- β Green checkmarks = Service operational
- β οΈ Yellow indicators = Service degradation
- π΄ Red indicators = Service disruption
- π Recent events = Click for details
Shows status for:
- EC2 (Compute)
- S3 (Storage)
- Lambda (Serverless)
- RDS (Databases)
- CloudFront (CDN)
- Route 53 (DNS)
- All AWS regions
API Status Check:
π apistatuscheck.com/api/aws
Why use it:
- Real-time monitoring (checks every 5 minutes)
- Historical uptime data
- Instant alerts (Slack, Discord, email)
- Tracks individual services separately
- Multi-region monitoring
Twitter/X Search:
π Search "AWS down" on Twitter
Why it works:
- DevOps teams report outages instantly
- AWS support responds here
- See which regions affected
- Identify specific services down
Pro tip: Search specific services: "EC2 down", "S3 down us-east-1", etc.
2. Check Region-Specific Status
AWS operates in multiple regions worldwide:
| Region Code | Location | Common Name |
|---|---|---|
| us-east-1 | N. Virginia | US East (most common) |
| us-east-2 | Ohio | US East 2 |
| us-west-1 | N. California | US West |
| us-west-2 | Oregon | US West 2 |
| eu-west-1 | Ireland | Europe |
| eu-central-1 | Frankfurt | Europe Central |
| ap-southeast-1 | Singapore | Asia Pacific |
| ap-northeast-1 | Tokyo | Asia Pacific |
| sa-east-1 | SΓ£o Paulo | South America |
Critical insight: AWS outages are almost always region-specific. us-east-1 can be down while us-west-2 is fine.
How to check your region:
- AWS Console β Top-right dropdown shows current region
- Check your resource configurations
- Look at health.aws.amazon.com region-by-region
Best practice: Deploy to multiple regions for redundancy.
3. Check Service-Specific Status
AWS has 200+ services. Focus on the major ones:
| Service | What It Does | Most Common Issues |
|---|---|---|
| EC2 | Virtual servers | Instance launch failures, connectivity |
| S3 | Object storage | High error rates, slow responses |
| Lambda | Serverless compute | Invocation failures, timeouts |
| RDS | Managed databases | Connection failures, slow queries |
| CloudFront | CDN | Cache misses, edge location issues |
| Route 53 | DNS | Resolution failures (rare) |
Your service might be down while AWS globally is up.
Common AWS Error Messages (And What They Mean)
EC2: "InsufficientInstanceCapacity"
What it means: AWS doesn't have enough physical capacity in that availability zone.
Causes:
- High demand in specific AZ
- Instance type shortage
- Spot instance availability
Quick fixes:
- Try different availability zone (us-east-1a β us-east-1b)
- Try different instance type (m5.large β m5a.large)
- Wait 30-60 minutes and retry
- Use different region temporarily
Long-term fix: Use Auto Scaling with multiple AZs.
S3: "503 Service Unavailable" or "SlowDown"
What it means: S3 is throttling requests or overloaded.
Causes:
- Too many requests to same prefix
- S3 service degradation
- Regional outage
Quick fixes:
- Implement exponential backoff (retry with increasing delays)
- Check S3 request rate limits
- Distribute requests across key prefixes
- Check AWS Status for S3 issues
Code example (exponential backoff):
import time
import boto3
from botocore.exceptions import ClientError
def s3_get_with_retry(bucket, key, max_retries=5):
s3 = boto3.client('s3')
for i in range(max_retries):
try:
return s3.get_object(Bucket=bucket, Key=key)
except ClientError as e:
if e.response['Error']['Code'] == '503':
wait = 2 ** i # Exponential backoff
time.sleep(wait)
else:
raise
raise Exception("Max retries exceeded")
Lambda: "Rate Exceeded" or "TooManyRequestsException"
What it means: Hit Lambda concurrency limits.
Causes:
- Account-level concurrent execution limit (default: 1000)
- Reserved concurrency limit
- Burst limit exceeded
Quick fixes:
- Check Lambda console β Throttles metric
- Request concurrency limit increase (AWS Support)
- Implement queue (SQS) to smooth traffic
- Check if specific function has reserved concurrency set too low
Check current limits:
aws lambda get-account-settings --region us-east-1
RDS: "Cannot Connect to Database"
What it means: Can't reach RDS instance.
Causes:
- Security group blocking access
- RDS instance stopped/terminated
- Network connectivity issue
- Regional outage
Quick fixes:
- Check RDS instance status (Console β RDS β Databases)
- Verify security group allows your IP (port 3306 for MySQL, 5432 for PostgreSQL)
- Check VPC routing/subnet configuration
- Test from EC2 instance in same VPC
- Check AWS Status for RDS issues
Test connection from EC2:
# MySQL
mysql -h your-rds-endpoint.rds.amazonaws.com -u admin -p
# PostgreSQL
psql -h your-rds-endpoint.rds.amazonaws.com -U admin -d mydb
CloudFront: "502 Bad Gateway" or "504 Gateway Timeout"
What it means: CloudFront can't reach your origin server.
Causes:
- Origin server down (S3, EC2, ALB)
- Origin timeout too short
- SSL/TLS certificate issues
- Origin security group blocking CloudFront IPs
Quick fixes:
- Check origin server health
- Verify origin domain/IP is correct (CloudFront console)
- Check origin response time (should be < 30 sec)
- Whitelist CloudFront IP ranges in security groups
- Check SSL certificate validity
Get CloudFront IP ranges:
curl https://ip-ranges.amazonaws.com/ip-ranges.json | grep CLOUDFRONT
Route 53: DNS Resolution Failures
What it means: DNS queries not resolving (very rare).
Causes:
- Hosted zone misconfigured
- Record set errors
- Health check failures causing failover
- Actual Route 53 outage (extremely rare)
Quick fixes:
- Test DNS resolution:
dig yourdomain.comornslookup yourdomain.com - Check Route 53 hosted zone records (Console β Route 53)
- Verify nameservers match (domain registrar = Route 53 nameservers)
- Check health check status
- Check AWS Status for Route 53 issues
Test DNS from multiple locations:
# Using dig
dig @8.8.8.8 yourdomain.com
# Using nslookup
nslookup yourdomain.com 8.8.8.8
Quick Fixes: AWS Service Issues
Fix #1: Check AWS Personal Health Dashboard
First stop for AWS issues.
How to access:
- AWS Console β Search "Health"
- Or visit: console.aws.amazon.com/health
What you'll see:
- Issues affecting YOUR resources
- Scheduled maintenance events
- Recent events history
- Affected resources list
Action items:
- Read event details
- Check "Affected resources" tab
- Follow AWS recommendations
- Set up email/SNS notifications
Fix #2: Verify Region Selection
Wrong region = resources "disappear"
Check current region:
- Top-right corner of AWS Console
- Should match where you created resources
Common mistake:
- Created EC2 in us-east-1
- Console switched to us-west-2
- "Where did my instances go?!"
Fix:
- Switch to correct region in dropdown
- Set up AWS CLI default region:
aws configure set region us-east-1
Fix #3: Check Service Quotas/Limits
AWS has limits on everything.
Common limits:
- EC2 instances per region (default: 20 On-Demand instances)
- S3 bucket names (globally unique)
- Lambda concurrent executions (default: 1000)
- EBS volumes per region (default: 5 TiB)
Check quotas:
- AWS Console β Service Quotas
- Search for service (e.g., "EC2")
- See current limit vs. usage
- Request increase if needed
Via CLI:
aws service-quotas list-service-quotas --service-code ec2
Pro tip: Request limit increases BEFORE you need them (can take 24-48 hours).
Fix #4: Implement Retry Logic with Exponential Backoff
AWS recommends exponential backoff for all API calls.
Why:
- Handles temporary failures
- Respects throttling
- Improves reliability
Implementation (Python boto3):
from botocore.config import Config
import boto3
# Configure automatic retries
config = Config(
retries = {
'max_attempts': 10,
'mode': 'adaptive' # or 'standard'
}
)
# Use with any AWS client
s3 = boto3.client('s3', config=config)
ec2 = boto3.client('ec2', config=config)
JavaScript (AWS SDK v3):
import { S3Client } from "@aws-sdk/client-s3";
const client = new S3Client({
maxAttempts: 10,
retryMode: "adaptive"
});
Fix #5: Check CloudWatch Metrics
CloudWatch shows what's actually happening.
Key metrics to check:
EC2:
- CPUUtilization
- StatusCheckFailed
- NetworkIn/NetworkOut
S3:
- 4xxErrors, 5xxErrors
- AllRequests
- BytesDownloaded
Lambda:
- Invocations
- Errors
- Throttles
- Duration
RDS:
- CPUUtilization
- DatabaseConnections
- ReadLatency, WriteLatency
How to access:
- AWS Console β CloudWatch β Metrics
- Select namespace (AWS/EC2, AWS/S3, etc.)
- Graph metrics for last 1-24 hours
- Look for spikes/drops
CLI example:
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time 2026-02-07T00:00:00Z \
--end-time 2026-02-07T23:59:59Z \
--period 3600 \
--statistics Average
Fix #6: Check Security Groups and NACLs
Most connectivity issues = security group misconfiguration.
Security Groups (instance-level firewall):
Check rules:
- EC2 Console β Security Groups
- Find relevant group
- Check Inbound rules (incoming traffic)
- Check Outbound rules (outgoing traffic)
Common issues:
- SSH (port 22) not allowed from your IP
- HTTP/HTTPS (80/443) not open to 0.0.0.0/0
- RDS port not open to application security group
- Forgot to allow outbound traffic (rare, but happens)
Quick fix for testing:
- Temporarily allow all traffic: 0.0.0.0/0 on all ports
- If it works, narrow down to specific ports/IPs
- NEVER leave wide open in production
Network ACLs (subnet-level firewall):
- Usually left at default (allow all)
- Check if someone modified them
- VPC β Network ACLs
Fix #7: Check IAM Permissions
"Access Denied" errors = IAM issue, not AWS down.
Troubleshoot:
1. Check who you are:
aws sts get-caller-identity
2. Test specific permission:
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/YourUser \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::your-bucket/*
3. Check CloudTrail for denied actions:
- CloudTrail β Event history
- Filter: "Error code = AccessDenied"
- See exactly which permission is missing
Common fixes:
- Attach policy with required permissions
- Add resource to existing policy
- Check if MFA required
- Verify you're using correct AWS account
Fix #8: Use AWS Support (If You Have a Plan)
AWS Support tiers:
| Plan | Response Time | Cost |
|---|---|---|
| Basic | No tech support | Free |
| Developer | 12-24 hours | $29/month |
| Business | 1 hour (critical) | $100+/month |
| Enterprise | 15 minutes (critical) | $15,000+/month |
When to contact support:
- Service limits need increasing
- Billing issues
- Technical issues you can't resolve
- Account or security issues
How to open case:
- AWS Console β Support β Create case
- Choose category (Service limit, technical, billing)
- Describe issue with details
- Attach CloudWatch graphs, error messages
Pro tip: Include AWS request IDs from error messages (speeds up troubleshooting).
EC2 Not Working?
Issue: Can't Connect to EC2 Instance
Troubleshoot:
1. Check instance state:
- EC2 Console β Instances
- Should be "running" (green)
- "stopped" = start it
- "terminated" = it's gone, launch new one
2. Check security group:
- Select instance β Security tab
- Click security group name
- Inbound rules should include:
- SSH (port 22) from your IP for Linux
- RDP (port 3389) from your IP for Windows
3. Test network connectivity:
# Ping (if ICMP allowed)
ping ec2-xx-xx-xx-xx.compute.amazonaws.com
# Test SSH port
telnet ec2-xx-xx-xx-xx.compute.amazonaws.com 22
# Or
nc -zv ec2-xx-xx-xx-xx.compute.amazonaws.com 22
4. Check if you have correct key:
- SSH requires .pem key file
- Key must match what you selected at launch
- Key permissions must be 400:
chmod 400 your-key.pem
5. Check System Status Checks:
- EC2 Console β Instance β Status checks tab
- "2/2 checks passed" = healthy
- Failed checks = hardware/network issue β Reboot or contact AWS
Issue: EC2 Instance Slow or Unresponsive
Causes:
- CPU throttling (T instance credits exhausted)
- Memory exhausted
- Disk I/O bottleneck
- Network saturation
Troubleshoot:
1. Check CloudWatch metrics:
- CPU, Network, Disk I/O graphs
- Look for maxed out metrics
2. For T instances (T2, T3, T4g), check CPU credits:
- CloudWatch β Metrics β EC2 β Per-Instance Metrics
- CPUCreditBalance
- If near zero, you're being throttled
Solutions:
- Switch to unlimited mode (costs more but no throttling)
- Upgrade to M, C, or R instance type
- Optimize application
3. Connect via EC2 Instance Connect or Session Manager:
- Browser-based console access (no SSH needed)
- EC2 Console β Instance β Connect button
S3 Not Working?
Issue: S3 Bucket Access Denied
Causes:
- Bucket policy blocking access
- IAM permissions missing
- Bucket in different region
- Bucket doesn't exist
Troubleshoot:
1. Check bucket exists:
aws s3 ls s3://your-bucket-name
2. Check bucket region:
aws s3api get-bucket-location --bucket your-bucket-name
3. Check bucket policy:
- S3 Console β Bucket β Permissions β Bucket policy
- Look for "Deny" statements
4. Check IAM permissions:
- Need s3:GetObject, s3:PutObject, s3:ListBucket, etc.
5. Check Block Public Access settings:
- S3 Console β Bucket β Permissions β Block public access
- May need to disable for public buckets
Issue: S3 High Error Rates
Check Service Health Dashboard:
- health.aws.amazon.com β S3
- Look for your region
Implement retry logic:
- See Fix #4 above
Optimize request patterns:
- Distribute across key prefixes (avoid sequential keys)
- Use CloudFront for frequently accessed objects
- Enable S3 Transfer Acceleration for uploads
Lambda Not Working?
Issue: Lambda Timeouts
Causes:
- Function timeout too short (default: 3 sec, max: 15 min)
- Slow dependencies (database, API calls)
- Cold starts
- VPC networking delays
Quick fixes:
1. Increase timeout:
- Lambda Console β Function β Configuration β General
- Set timeout higher (but find root cause)
2. Check CloudWatch Logs:
- Lambda Console β Function β Monitor β View logs in CloudWatch
- See exactly where function is slow
3. Optimize function:
- Reduce package size
- Increase memory (also increases CPU)
- Remove VPC if not needed (VPC adds latency)
- Use provisioned concurrency for critical functions
Issue: Lambda "Function Not Found"
Causes:
- Function in wrong region
- Function deleted
- Wrong function name
Quick fixes:
- Check region (top-right dropdown)
- List functions:
aws lambda list-functions - Verify function ARN
When AWS Actually Goes Down
What Happens
Major AWS outages (recent):
- December 2021: us-east-1 outage (7 hours) - networking issue
- July 2022: us-east-1 power issue (2 hours)
- June 2023: us-east-1 EC2 API issues (3 hours)
Typical causes:
- Power issues at data centers
- Networking failures
- Software deployment bugs
- Rare: DDoS attacks
Impact:
- Regional (usually just one region)
- Service-specific (EC2 down, but S3 works)
- Cascading failures (one service depends on another)
How AWS Responds
Communication:
- AWS Service Health Dashboard
- @AWSSupport on Twitter
- Personal Health Dashboard notifications
- Post-incident reports (PIR) published later
Timeline:
- 0-15 min: Users report issues on Twitter
- 15-30 min: AWS acknowledges on dashboard
- 30-90 min: Regular updates
- Resolution: Hours to days for major outages
- Post-mortem: Detailed PIR published weeks later
What to Do During Outages
1. Activate failover (if configured):
- Switch to different region
- Use read replicas for databases
- Activate standby resources
2. Monitor Personal Health Dashboard:
- Shows YOUR affected resources
- Provides specific guidance
3. Communicate with stakeholders:
- Update status page
- Notify customers
- Set expectations
4. Document incident:
- Screenshot error messages
- Save CloudWatch graphs
- Note timeline
- Use for post-mortem
5. Consider SLA credits:
- AWS SLA: 99.99% uptime for most services
- If missed, request service credits
- Submit within 30 days of incident
AWS Down Checklist
Follow these steps in order:
Step 1: Verify it's actually AWS
- Check AWS Service Health Dashboard
- Check AWS Personal Health Dashboard
- Search Twitter: "AWS down [region]"
- Check specific service status
- Verify correct region selected
Step 2: Service-specific checks
- EC2: Check instance status, security groups
- S3: Test bucket access, check error rates
- Lambda: Check CloudWatch logs, metrics
- RDS: Test connection, check instance status
- CloudFront: Check origin health
- Route 53: Test DNS resolution
Step 3: Configuration troubleshooting
- Check security groups/NACLs
- Verify IAM permissions
- Check service quotas/limits
- Review CloudWatch metrics
- Check CloudTrail for errors
Step 4: Implement workarounds
- Add retry logic with exponential backoff
- Failover to different region (if multi-region)
- Use alternate service (e.g., S3 β CloudFront)
- Scale resources if capacity issue
Step 5: Contact AWS (if needed)
- Open AWS Support case
- Include request IDs, error messages
- Attach CloudWatch graphs
- Escalate if critical
Prevent Future Issues
1. Design for Failure
AWS Best Practices:
Multi-AZ deployment:
Single AZ = single point of failure
Multi-AZ = survives data center failure
Multi-Region for critical workloads:
- Active-active or active-passive
- Route 53 health checks + failover
- Cross-region replication (S3, RDS, DynamoDB)
Example architecture:
βββββββββββββββββββ βββββββββββββββββββ
β us-east-1 β β us-west-2 β
β (Primary) βββββββββββ€ (Backup) β
β β β β
β EC2 Auto Scale β β EC2 Auto Scale β
β RDS Multi-AZ β β RDS Read Rep β
β S3 (CRRβ) β β S3 (βCRR) β
βββββββββββββββββββ βββββββββββββββββββ
β² β²
β β
Route 53 (health check + failover)
2. Implement Monitoring and Alerts
CloudWatch Alarms:
Critical alarms to set up:
- EC2 StatusCheckFailed
- RDS DatabaseConnections > threshold
- Lambda Errors > threshold
- S3 4xxErrors or 5xxErrors spike
- ALB TargetResponseTime > threshold
Example alarm (CLI):
aws cloudwatch put-metric-alarm \
--alarm-name ec2-cpu-high \
--alarm-description "Alert if CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:my-topic
Third-party monitoring:
- API Status Check - External monitoring
- Datadog, New Relic, Dynatrace - APM
- PagerDuty - Incident management
3. Use AWS Health API
Automate health check monitoring:
import boto3
health = boto3.client('health', region_name='us-east-1')
# Get all open issues
events = health.describe_events(
filter={
'eventStatusCodes': ['open', 'upcoming']
}
)
for event in events['events']:
print(f"Service: {event['service']}")
print(f"Region: {event.get('region', 'GLOBAL')}")
print(f"Status: {event['eventStatusCode']}")
print(f"Description: {event['eventTypeCode']}")
Set up SNS notifications:
- Personal Health Dashboard β Preferences
- Configure email/SMS for events
4. Regular DR Drills
Disaster Recovery testing:
Quarterly exercises:
- Simulate region failure
- Failover to backup region
- Test recovery time
- Document issues found
- Update runbooks
GameDay exercises:
- AWS hosts GameDay events
- Simulate real outage scenarios
- Practice incident response
- Improve team coordination
5. Keep Service Quotas Ahead
Proactive limit increases:
Before Black Friday, product launches, etc.:
- Review current usage
- Project peak demand
- Request quota increases 2-4 weeks early
- Confirm increases before event
Auto-scaling quotas:
- Make sure auto-scaling limits match instance quotas
- Request limits 2x peak demand (headroom)
Key Takeaways
Before assuming AWS is down:
- β Check AWS Service Health Dashboard
- β Check Personal Health Dashboard
- β Verify correct region selected
- β Search Twitter for "AWS down [region]"
- β Test specific service (EC2, S3, Lambda, etc.)
Common fixes:
- Check security groups (most connectivity issues)
- Verify IAM permissions (most access denied errors)
- Check service quotas (hit limits)
- Implement retry logic with exponential backoff
- Review CloudWatch metrics and logs
Service-specific issues:
- EC2: Security groups, status checks, instance capacity
- S3: Bucket policies, retry logic, key distribution
- Lambda: Timeouts, concurrency limits, CloudWatch logs
- RDS: Security groups, connection limits, Multi-AZ
- CloudFront: Origin health, SSL certificates
- Route 53: DNS records, health checks (rarely down)
If AWS is actually down:
- Monitor Health Dashboard for updates
- Activate failover to different region (if configured)
- Communicate with stakeholders
- Document incident for post-mortem
- Consider requesting SLA credits
Prevent future issues:
- Design multi-AZ/multi-region architecture
- Set up CloudWatch alarms
- Use Personal Health Dashboard API
- Practice DR drills quarterly
- Request service quota increases proactively
Remember: Most AWS issues are configuration errors or hitting limits, not actual AWS outages. Check security groups, IAM permissions, and quotas first.
Need real-time AWS status monitoring? Track AWS uptime with API Status Check - Get instant alerts when AWS services go down.
Related Resources
- Is AWS Down Right Now? β Live status check
- AWS Outage History β Past incidents and timeline
- AWS vs Azure Uptime β Which cloud is more reliable?
- Multi-Region DR Strategy β Build resilient cloud architecture
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status β