Understanding API SLAs: What 99.9% Uptime Really Means

"99.9% uptime guaranteed!" sounds great until you do the math. That's 8.76 hours of downtime per year—or 43.8 minutes per month. For a payment API like Stripe, that could mean thousands of failed transactions.

Most developers glance at SLA numbers without understanding what they actually mean. Then downtime hits, revenue tanks, and they realize the fine print matters.

Here's everything you need to know about API SLAs—and how to avoid getting burned.

What is an API SLA?

SLA = Service Level Agreement

It's a contract between an API provider and you (the customer) that defines:

  • Uptime guarantees (99%, 99.9%, 99.99%)
  • Performance targets (response time, throughput)
  • Support response times (how fast they help when things break)
  • Compensation (what you get when they fail to deliver)

Key point: An SLA is a promise, not a reality. It's what the provider aims for, not what you're guaranteed to experience.

The Truth About Uptime Percentages

Common SLA Tiers

Uptime % Downtime/Year Downtime/Month Downtime/Week Real Impact
90% 36.5 days 3 days 16.8 hours Unacceptable for production
95% 18.25 days 1.5 days 8.4 hours Budget tier, risky
99% 3.65 days 7.2 hours 1.68 hours Entry-level SaaS
99.9% 8.76 hours 43.8 min 10.1 min Industry standard
99.95% 4.38 hours 21.9 min 5 min High-quality APIs
99.99% 52.6 min 4.38 min 1.01 min Enterprise grade
99.999% 5.26 min 26 sec 6 sec "Five nines" (rare, expensive)

What "99.9% Uptime" Actually Means

Scenario: Your payment API has a 99.9% SLA.

You think: "Great, only 10 minutes of downtime per week!"

Reality:

  • 43 minutes/month can happen anytime (Murphy's Law: during peak hours)
  • If you process $10,000/hour, that's $7,167 in lost revenue
  • Users don't care about your SLA—they just know your checkout is broken
  • Some providers count "scheduled maintenance" separately (read the fine print!)

The math:

99.9% uptime = 0.1% downtime
0.1% of 730 hours/month = 43.8 minutes
43.8 minutes × $10,000/hour = $7,300 potential loss

Bottom line: Even "excellent" SLAs allow significant downtime.

How API Providers Calculate Uptime

Method 1: Simple Availability

Formula: (Total time - Downtime) / Total time

Example:

  • Month: 730 hours
  • Downtime: 1 hour
  • Uptime: (730 - 1) / 730 = 99.86%

Sounds simple, but...

Tricky parts:

  1. What counts as "down"?

    • Some providers only count total outages (API returns nothing)
    • Slow responses (5 seconds instead of 100ms) might not count
    • Partial outages (50% error rate) might be "up" by their definition
  2. When is downtime measured?

    • Only successful requests? (Ignores failed ones)
    • Only peak hours? (Hides overnight issues)
    • Excludes "scheduled maintenance"?

Method 2: Success Rate

Formula: Successful requests / Total requests

Example:

  • 1 million requests
  • 999,000 succeeded
  • Uptime: 999,000 / 1,000,000 = 99.9%

Better metric because it reflects user experience, not just "API is responding."

Method 3: Weighted Availability

Some providers measure different endpoints separately:

Example (Stripe):

  • Payment processing: 99.99% SLA (critical)
  • Reporting API: 99.9% SLA (less critical)
  • Webhooks: 99.95% SLA (important but not blocking)

Your actual uptime: Depends on which endpoint fails.

SLA Fine Print: What They Don't Tell You

Exclusions (What Doesn't Count)

Most SLAs exclude:

1. Scheduled Maintenance

"We may take the service offline for up to 4 hours/month 
for planned maintenance with 24-hour notice."

Translation: That 99.9% SLA just became 99.3% in practice.

2. Your Fault

"Downtime caused by customer misuse, including rate 
limit violations or invalid API calls, is excluded."

Translation: If you hit their API too hard and it throttles you, that's on you.

3. Force Majeure (Acts of God)

"Downtime due to natural disasters, wars, pandemics, 
or other events beyond our control is excluded."

Translation: If AWS has a regional outage, your API provider isn't liable.

4. Third-Party Services

"We are not responsible for outages in dependencies 
(DNS providers, CDN networks, etc.)."

Translation: Your API might be "up" even if it's unusable due to network issues.

Credits vs. Refunds

Most SLAs offer credits, not refunds:

Example (Typical SLA):

  • 99.9% promised, 99% delivered → 10% credit
  • 99.9% promised, 95% delivered → 25% credit
  • 99.9% promised, 90% delivered → 50% credit

You pay $1,000/month, they're down for 7 hours:

  • Lost revenue: $20,000 (your payments were offline)
  • Credit: $100 (10% of your monthly bill)

The math doesn't work out. SLA credits barely compensate for actual business impact.

How to Claim Credits

Most providers require you to:

  1. Request credit within 30 days
  2. Prove the outage impacted you (logs, screenshots)
  3. Submit a formal ticket

They don't automatically apply credits. Most users never bother claiming, which saves providers millions.

Real API SLA Examples

Stripe

Uptime SLA: 99.99% (52 minutes/year)

Fine print:

  • Scheduled maintenance excluded (up to 4 hours/quarter)
  • Only counts "platform unavailability" (not slow responses)
  • Credits: 10-100% depending on severity
  • Must claim within 30 days

Reality: Stripe is extremely reliable, but when they go down (March 2019, 4 hours), entire internet commerce halts.

OpenAI

Uptime SLA: None for standard tier

GPT-4 API: "We'll try our best" (no formal SLA)

Enterprise tier: Custom SLAs negotiated

Translation: If ChatGPT goes down, you're SOL unless you're paying enterprise rates.

AWS

Uptime SLA: 99.99% (EC2, S3)

Fine print:

  • Measured per region (not globally)
  • Excludes "service-specific" issues
  • Credits: 10-100% depending on severity

Reality: AWS is rock-solid, but regional outages happen (US-East-1 in 2021 took down half the internet).

Twilio

Uptime SLA: 99.95%

SMS delivery: "Best effort" (no guarantee)

Credits: 10-100% based on downtime

Translation: Voice/SMS might fail to send, and that's not covered by the SLA.

What to Look for in an API SLA

1. Uptime Guarantee

Minimum acceptable:

  • Critical APIs (payments, auth): 99.95%+
  • Important APIs (email, SMS): 99.9%+
  • Nice-to-have APIs (analytics): 99%+

Red flags:

  • No published SLA (run away)
  • Below 99% uptime
  • "We'll try our best" (not a real SLA)

2. Performance Guarantees

Look for:

  • P50 latency (median response time)
  • P95 latency (95th percentile)
  • P99 latency (worst 1% of requests)

Example (Good SLA):

  • P50: <100ms
  • P95: <500ms
  • P99: <2s

Example (Bad SLA):

  • "Typical response time: 1-5 seconds"
  • No P95/P99 metrics
  • No latency SLA at all

3. Support Response Times

Tier levels:

Severity Enterprise Business Standard
Critical (down) 15 min 1 hour 24 hours
High (degraded) 1 hour 4 hours 48 hours
Medium 4 hours 24 hours 5 days
Low 24 hours 5 days Never

Red flag: "We respond to all tickets within 7 business days" = they're not serious about uptime.

4. Compensation

Good SLA:

  • Automatic credits (no claim needed)
  • Prorated refunds
  • 100% credit for severe outages

Bad SLA:

  • "Credits at our discretion"
  • Caps at 100% of monthly fee (doesn't cover actual losses)
  • Complex claim process

How to Protect Yourself

1. Don't Rely on a Single API

Multi-provider strategy:

Payments:

  • Primary: Stripe
  • Backup: PayPal
  • Failover: Auto-switch on error

AI:

  • Primary: OpenAI GPT-4
  • Backup: Anthropic Claude
  • Fallback: Cached responses

Email:

  • Primary: SendGrid
  • Backup: Resend
  • Failover: AWS SES

2. Monitor Uptime Yourself

Don't trust the provider's status page.

Use third-party monitoring:

  • API Status Check - Real-time monitoring for 100+ APIs
  • Datadog - Full infrastructure monitoring
  • Pingdom - Uptime tracking

Why? Providers define "up" differently than you do. Monitor from your users' perspective.

3. Build in Graceful Degradation

When APIs fail, don't break your entire product.

Strategies:

  • Cache responses (show stale data during outages)
  • Queue requests (process when API comes back)
  • Show friendly errors ("Payment system temporarily unavailable, try PayPal")

Example:

async function processPayment() {
  try {
    return await stripe.charge(...)
  } catch (error) {
    // Stripe down? Try PayPal
    return await paypal.charge(...)
  }
}

4. Negotiate Better Terms

If you're paying $5K+/month, negotiate:

  • Higher uptime guarantee (99.95% → 99.99%)
  • Faster support response
  • Better compensation (revenue loss coverage)
  • Custom SLAs for critical features

Leverage: "We're evaluating competitors. Can you match their 99.99% SLA?"

Questions to Ask Before Signing

1. What counts as downtime?

  • Total outage only?
  • Slow responses?
  • Partial failures?

2. How do you measure uptime?

  • Per endpoint?
  • Global or per region?
  • Success rate or availability?

3. What's excluded from the SLA?

  • Scheduled maintenance?
  • DDoS attacks?
  • Third-party dependencies?

4. How do I claim credits?

  • Automatic or manual?
  • Proof required?
  • Time limits?

5. What happens during extended outages?

  • Full refund?
  • Contract termination option?
  • Revenue loss coverage?

6. Do you have a track record?

  • Historical uptime stats?
  • Public status page?
  • Post-mortems from past outages?

Red Flags to Watch For

❌ No public SLA
If they won't publish uptime guarantees, assume the worst.

❌ "Best effort" language
Not legally binding. Means nothing.

❌ Credits capped at monthly fee
You lost $100K, they give you $500 credit. Not fair.

❌ No performance metrics
Uptime without latency SLA = useless. A 30-second response time is technically "up."

❌ Vague exclusions
"Downtime beyond our control" could mean anything.

❌ Manual credit claims only
Friction = fewer claims = they save money.

The Bottom Line

99.9% uptime sounds good until you do the math:

  • 43 minutes/month = potential revenue loss
  • SLA credits rarely cover actual damages
  • Fine print excludes most real-world scenarios

How to protect yourself:

  1. Diversify: Use multiple providers for critical APIs
  2. Monitor: Don't trust their status page
  3. Degrade gracefully: Build fallbacks into your product
  4. Negotiate: If you're paying serious money, get better terms

Remember: An SLA is a minimum bar, not a promise of perfection. Even the best APIs go down. Your job is to make sure your product survives when they do.

Related Resources

Monitor Your APIs

Check the real-time status of 100+ popular APIs used by developers.

View API Status →