Understanding API SLAs: What 99.9% Uptime Really Means
"99.9% uptime guaranteed!" sounds great until you do the math. That's 8.76 hours of downtime per year—or 43.8 minutes per month. For a payment API like Stripe, that could mean thousands of failed transactions.
Most developers glance at SLA numbers without understanding what they actually mean. Then downtime hits, revenue tanks, and they realize the fine print matters.
Here's everything you need to know about API SLAs—and how to avoid getting burned.
What is an API SLA?
SLA = Service Level Agreement
It's a contract between an API provider and you (the customer) that defines:
- Uptime guarantees (99%, 99.9%, 99.99%)
- Performance targets (response time, throughput)
- Support response times (how fast they help when things break)
- Compensation (what you get when they fail to deliver)
Key point: An SLA is a promise, not a reality. It's what the provider aims for, not what you're guaranteed to experience.
The Truth About Uptime Percentages
Common SLA Tiers
| Uptime % | Downtime/Year | Downtime/Month | Downtime/Week | Real Impact |
|---|---|---|---|---|
| 90% | 36.5 days | 3 days | 16.8 hours | Unacceptable for production |
| 95% | 18.25 days | 1.5 days | 8.4 hours | Budget tier, risky |
| 99% | 3.65 days | 7.2 hours | 1.68 hours | Entry-level SaaS |
| 99.9% | 8.76 hours | 43.8 min | 10.1 min | Industry standard |
| 99.95% | 4.38 hours | 21.9 min | 5 min | High-quality APIs |
| 99.99% | 52.6 min | 4.38 min | 1.01 min | Enterprise grade |
| 99.999% | 5.26 min | 26 sec | 6 sec | "Five nines" (rare, expensive) |
What "99.9% Uptime" Actually Means
Scenario: Your payment API has a 99.9% SLA.
You think: "Great, only 10 minutes of downtime per week!"
Reality:
- 43 minutes/month can happen anytime (Murphy's Law: during peak hours)
- If you process $10,000/hour, that's $7,167 in lost revenue
- Users don't care about your SLA—they just know your checkout is broken
- Some providers count "scheduled maintenance" separately (read the fine print!)
The math:
99.9% uptime = 0.1% downtime
0.1% of 730 hours/month = 43.8 minutes
43.8 minutes × $10,000/hour = $7,300 potential loss
Bottom line: Even "excellent" SLAs allow significant downtime.
How API Providers Calculate Uptime
Method 1: Simple Availability
Formula: (Total time - Downtime) / Total time
Example:
- Month: 730 hours
- Downtime: 1 hour
- Uptime: (730 - 1) / 730 = 99.86%
Sounds simple, but...
Tricky parts:
What counts as "down"?
- Some providers only count total outages (API returns nothing)
- Slow responses (5 seconds instead of 100ms) might not count
- Partial outages (50% error rate) might be "up" by their definition
When is downtime measured?
- Only successful requests? (Ignores failed ones)
- Only peak hours? (Hides overnight issues)
- Excludes "scheduled maintenance"?
Method 2: Success Rate
Formula: Successful requests / Total requests
Example:
- 1 million requests
- 999,000 succeeded
- Uptime: 999,000 / 1,000,000 = 99.9%
Better metric because it reflects user experience, not just "API is responding."
Method 3: Weighted Availability
Some providers measure different endpoints separately:
Example (Stripe):
- Payment processing: 99.99% SLA (critical)
- Reporting API: 99.9% SLA (less critical)
- Webhooks: 99.95% SLA (important but not blocking)
Your actual uptime: Depends on which endpoint fails.
SLA Fine Print: What They Don't Tell You
Exclusions (What Doesn't Count)
Most SLAs exclude:
1. Scheduled Maintenance
"We may take the service offline for up to 4 hours/month
for planned maintenance with 24-hour notice."
Translation: That 99.9% SLA just became 99.3% in practice.
2. Your Fault
"Downtime caused by customer misuse, including rate
limit violations or invalid API calls, is excluded."
Translation: If you hit their API too hard and it throttles you, that's on you.
3. Force Majeure (Acts of God)
"Downtime due to natural disasters, wars, pandemics,
or other events beyond our control is excluded."
Translation: If AWS has a regional outage, your API provider isn't liable.
4. Third-Party Services
"We are not responsible for outages in dependencies
(DNS providers, CDN networks, etc.)."
Translation: Your API might be "up" even if it's unusable due to network issues.
Credits vs. Refunds
Most SLAs offer credits, not refunds:
Example (Typical SLA):
- 99.9% promised, 99% delivered → 10% credit
- 99.9% promised, 95% delivered → 25% credit
- 99.9% promised, 90% delivered → 50% credit
You pay $1,000/month, they're down for 7 hours:
- Lost revenue: $20,000 (your payments were offline)
- Credit: $100 (10% of your monthly bill)
The math doesn't work out. SLA credits barely compensate for actual business impact.
How to Claim Credits
Most providers require you to:
- Request credit within 30 days
- Prove the outage impacted you (logs, screenshots)
- Submit a formal ticket
They don't automatically apply credits. Most users never bother claiming, which saves providers millions.
Real API SLA Examples
Stripe
Uptime SLA: 99.99% (52 minutes/year)
Fine print:
- Scheduled maintenance excluded (up to 4 hours/quarter)
- Only counts "platform unavailability" (not slow responses)
- Credits: 10-100% depending on severity
- Must claim within 30 days
Reality: Stripe is extremely reliable, but when they go down (March 2019, 4 hours), entire internet commerce halts.
OpenAI
Uptime SLA: None for standard tier
GPT-4 API: "We'll try our best" (no formal SLA)
Enterprise tier: Custom SLAs negotiated
Translation: If ChatGPT goes down, you're SOL unless you're paying enterprise rates.
AWS
Uptime SLA: 99.99% (EC2, S3)
Fine print:
- Measured per region (not globally)
- Excludes "service-specific" issues
- Credits: 10-100% depending on severity
Reality: AWS is rock-solid, but regional outages happen (US-East-1 in 2021 took down half the internet).
Twilio
Uptime SLA: 99.95%
SMS delivery: "Best effort" (no guarantee)
Credits: 10-100% based on downtime
Translation: Voice/SMS might fail to send, and that's not covered by the SLA.
What to Look for in an API SLA
1. Uptime Guarantee
Minimum acceptable:
- Critical APIs (payments, auth): 99.95%+
- Important APIs (email, SMS): 99.9%+
- Nice-to-have APIs (analytics): 99%+
Red flags:
- No published SLA (run away)
- Below 99% uptime
- "We'll try our best" (not a real SLA)
2. Performance Guarantees
Look for:
- P50 latency (median response time)
- P95 latency (95th percentile)
- P99 latency (worst 1% of requests)
Example (Good SLA):
- P50: <100ms
- P95: <500ms
- P99: <2s
Example (Bad SLA):
- "Typical response time: 1-5 seconds"
- No P95/P99 metrics
- No latency SLA at all
3. Support Response Times
Tier levels:
| Severity | Enterprise | Business | Standard |
|---|---|---|---|
| Critical (down) | 15 min | 1 hour | 24 hours |
| High (degraded) | 1 hour | 4 hours | 48 hours |
| Medium | 4 hours | 24 hours | 5 days |
| Low | 24 hours | 5 days | Never |
Red flag: "We respond to all tickets within 7 business days" = they're not serious about uptime.
4. Compensation
Good SLA:
- Automatic credits (no claim needed)
- Prorated refunds
- 100% credit for severe outages
Bad SLA:
- "Credits at our discretion"
- Caps at 100% of monthly fee (doesn't cover actual losses)
- Complex claim process
How to Protect Yourself
1. Don't Rely on a Single API
Multi-provider strategy:
Payments:
- Primary: Stripe
- Backup: PayPal
- Failover: Auto-switch on error
AI:
- Primary: OpenAI GPT-4
- Backup: Anthropic Claude
- Fallback: Cached responses
Email:
- Primary: SendGrid
- Backup: Resend
- Failover: AWS SES
2. Monitor Uptime Yourself
Don't trust the provider's status page.
Use third-party monitoring:
- API Status Check - Real-time monitoring for 100+ APIs
- Datadog - Full infrastructure monitoring
- Pingdom - Uptime tracking
Why? Providers define "up" differently than you do. Monitor from your users' perspective.
3. Build in Graceful Degradation
When APIs fail, don't break your entire product.
Strategies:
- Cache responses (show stale data during outages)
- Queue requests (process when API comes back)
- Show friendly errors ("Payment system temporarily unavailable, try PayPal")
Example:
async function processPayment() {
try {
return await stripe.charge(...)
} catch (error) {
// Stripe down? Try PayPal
return await paypal.charge(...)
}
}
4. Negotiate Better Terms
If you're paying $5K+/month, negotiate:
- Higher uptime guarantee (99.95% → 99.99%)
- Faster support response
- Better compensation (revenue loss coverage)
- Custom SLAs for critical features
Leverage: "We're evaluating competitors. Can you match their 99.99% SLA?"
Questions to Ask Before Signing
1. What counts as downtime?
- Total outage only?
- Slow responses?
- Partial failures?
2. How do you measure uptime?
- Per endpoint?
- Global or per region?
- Success rate or availability?
3. What's excluded from the SLA?
- Scheduled maintenance?
- DDoS attacks?
- Third-party dependencies?
4. How do I claim credits?
- Automatic or manual?
- Proof required?
- Time limits?
5. What happens during extended outages?
- Full refund?
- Contract termination option?
- Revenue loss coverage?
6. Do you have a track record?
- Historical uptime stats?
- Public status page?
- Post-mortems from past outages?
Red Flags to Watch For
❌ No public SLA
If they won't publish uptime guarantees, assume the worst.
❌ "Best effort" language
Not legally binding. Means nothing.
❌ Credits capped at monthly fee
You lost $100K, they give you $500 credit. Not fair.
❌ No performance metrics
Uptime without latency SLA = useless. A 30-second response time is technically "up."
❌ Vague exclusions
"Downtime beyond our control" could mean anything.
❌ Manual credit claims only
Friction = fewer claims = they save money.
The Bottom Line
99.9% uptime sounds good until you do the math:
- 43 minutes/month = potential revenue loss
- SLA credits rarely cover actual damages
- Fine print excludes most real-world scenarios
How to protect yourself:
- Diversify: Use multiple providers for critical APIs
- Monitor: Don't trust their status page
- Degrade gracefully: Build fallbacks into your product
- Negotiate: If you're paying serious money, get better terms
Remember: An SLA is a minimum bar, not a promise of perfection. Even the best APIs go down. Your job is to make sure your product survives when they do.
Related Resources
- Compare API Uptime — Head-to-head reliability comparisons
- Most Reliable APIs of 2026 — Annual uptime rankings
- API Outage Response Plan — What to do when APIs break
- Best API Monitoring Tools 2026 — Tool comparison guide
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status →