Is Azure Down? Complete Status Check Guide + Quick Fixes

Azure Portal not loading?
VMs unresponsive?
App Service deployment failing?

Before panicking, verify if Azure is actually downβ€”or if it's a configuration issue, quota problem, or regional outage. Here's your complete guide to checking Azure status and fixing common cloud infrastructure issues.

Quick Check: Is Azure Actually Down?

Don't assume it's Azure. 50% of "Azure down" reports are actually configuration errors, quota limits, or subscription issuesβ€”not platform outages.

1. Check Official Sources

Azure Status Page:
πŸ”— status.azure.com

What to look for:

  • βœ… "No current issues" = Azure is fine
  • ⚠️ "Active event" = Some services/regions affected
  • πŸ”΄ "Outage" = Azure is down

Real-time updates:

  • Azure Portal availability
  • Virtual Machines status
  • App Service health
  • Azure Active Directory (Entra ID)
  • Storage Accounts
  • Azure Functions
  • Regional outages
  • Service-specific incidents

Pro tip: Filter by service and region you're using.

API Status Check:
πŸ”— apistatuscheck.com/api/azure

Why use it:

  • Real-time monitoring (checks every 5 minutes)
  • Historical uptime data
  • Instant alerts (Slack, Discord, email)
  • Tracks Portal, VMs, App Service separately
  • Third-party verification

Twitter/X Search:
πŸ”— Search "Azure down" on Twitter

Why it works:

  • Users report issues instantly
  • See if others experiencing same problem
  • Regional patterns emerge
  • Microsoft responds here: @Azure, @AzureSupport

Pro tip: If 1,000+ tweets in last hour mention "Azure down," it's likely a real outage.


DownDetector:
πŸ”— downdetector.com/status/windows-azure

Shows:

  • Real-time user reports
  • Heatmap of affected areas
  • Most reported problems (portal, VMs, storage)

2. Check Service-Specific Status

Azure has 200+ services that can fail independently:

Service What It Does Common Issues
Azure Portal Web management interface Portal not loading, timeouts
Virtual Machines IaaS compute VM not starting, connectivity lost
App Service PaaS web hosting Deployment fails, apps down
Azure AD (Entra ID) Identity/authentication Login failures, token errors
Azure Storage Blob/file/queue storage Upload fails, access denied
Azure Functions Serverless compute Function not triggering, timeouts
Azure SQL Managed databases Connection failures, performance
Azure DevOps CI/CD platform Pipeline failures, repo access

Your service might be down while Azure globally is up.


3. Check Regional Status

Azure has 60+ regions worldwide. Outages are often regional.

Check your region:

  1. Go to status.azure.com
  2. Filter by your region (e.g., "East US", "West Europe")
  3. See if active incidents in your region

Find your resource region:

  • Azure Portal β†’ Your resource β†’ Overview β†’ Location

Multi-region strategy:

  • If East US is down, try deploying to West US temporarily
  • Production apps should span multiple regions

4. Test Different Access Methods

If Azure Portal works but Azure CLI doesn't, it's likely tool-specific.

Platform Test Method
Azure Portal portal.azure.com
Azure CLI az login && az account show
Azure PowerShell Connect-AzAccount
Azure Mobile App Launch Azure app (iOS/Android)

Decision tree:

Portal works + CLI fails β†’ CLI auth/config issue
Portal fails + CLI works β†’ Browser/network issue
Nothing works β†’ Azure likely down (or subscription issue)

Common Azure Error Messages (And What They Mean)

"This site can't be reached" (Azure Portal)

What it means: Can't connect to portal.azure.com.

Causes:

  • Internet connection issue
  • Firewall blocking Azure domains
  • DNS resolution failure
  • Rare: Azure Portal outage

Quick fixes:

  1. Test internet connection (visit google.com)
  2. Check DNS: nslookup portal.azure.com
  3. Try different browser
  4. Try incognito/private mode
  5. Disable VPN temporarily (test)
  6. Check firewall settings
  7. Try Azure CLI (bypass portal entirely)

For corporate networks:

  • Whitelist *.azure.com, *.microsoft.com
  • Check proxy configuration
  • Contact IT admin

"Subscription not found" or "No subscriptions found"

What it means: Can't access Azure subscription.

Causes:

  • Signed in with wrong account
  • Subscription expired/disabled
  • No subscriptions associated with account
  • Permissions revoked

Quick fixes:

  1. Verify signed-in account: Portal β†’ Profile icon β†’ Check email
  2. Switch directory: Portal β†’ Settings β†’ Directories + subscriptions
  3. Check subscription status: Account portal
  4. Verify payment method current (credit card not expired)
  5. Contact subscription admin (may need access granted)

Check subscription status via CLI:

az account list --output table
az account show

"The subscription is disabled and therefore marked as read only"

What it means: Subscription suspended.

Causes:

  • Payment method failed
  • Spending limit reached
  • Trial expired
  • Credit card expired
  • Account under review

Quick fixes:

  1. Go to Azure Account Center
  2. Update payment method
  3. Check for outstanding invoices
  4. Remove spending limit (if applicable)
  5. Contact Azure Support (may be fraud hold)

For free trial:

  • Trial typically 30 days or $200 credit
  • Must upgrade to pay-as-you-go to continue

"Quota exceeded" or "Operation could not be completed as it results in exceeding quota limits"

What it means: Hit subscription or regional quota limit.

Causes:

  • Too many VMs in region
  • Too many cores requested
  • Too many storage accounts
  • Public IP address limit reached

Quick fixes:

1. Check current quota:

  • Portal β†’ Subscriptions β†’ Your subscription β†’ Usage + quotas
  • Filter by region and service

2. Request quota increase:

  • Portal β†’ Help + support β†’ New support request
  • Issue type: Service and subscription limits (quotas)
  • Provide justification and desired limit

3. Clean up unused resources:

# List all VMs
az vm list --output table

# Delete unused VM
az vm delete --name MyVM --resource-group MyRG --yes

# List unused disks
az disk list --query "[?managedBy==null]" --output table

4. Use different region:

  • Some regions have higher limits
  • Try deploying to less-congested region

Common quotas:

  • Standard VMs: 10-20 per region (default)
  • vCPUs: 10-20 per region (default)
  • Storage accounts: 250 per region
  • Public IPs: 10-20 per region

"Allocation failed" (Virtual Machines)

What it means: Azure can't allocate hardware for your VM.

Causes:

  • Datacenter capacity constraints
  • Specific VM size unavailable in region
  • Availability zone full
  • Hardware generation not available

Quick fixes:

1. Try different region:

# Check VM size availability in regions
az vm list-sizes --location eastus --output table
az vm list-sizes --location westus --output table

2. Try different VM size:

  • Use similar size (e.g., D2s_v3 instead of D2_v3)
  • Older generation may have availability

3. Stop and redeploy VM:

  • Stop (deallocate) VM
  • Wait a few minutes
  • Start VM again (may allocate to different hardware)
az vm deallocate --name MyVM --resource-group MyRG
az vm start --name MyVM --resource-group MyRG

4. Create new VM in availability set:

  • Provides better allocation guarantees

5. Contact Azure Support:

  • For critical workloads, support can help with allocation

"Authentication failed" or "AADSTS" errors (Azure AD/Entra ID)

What it means: Can't authenticate to Azure AD.

Causes:

  • Password incorrect
  • MFA issue
  • Conditional access blocking
  • Token expired
  • Service principal credentials invalid

Quick fixes:

1. Verify credentials:

2. Clear token cache (Azure CLI):

az account clear
az login

3. Check MFA:

  • Complete MFA challenge
  • Verify authentication app working (Microsoft Authenticator)

4. Service principal authentication:

# Test service principal
az login --service-principal \
  --username <app-id> \
  --password <password-or-cert> \
  --tenant <tenant-id>

5. Review conditional access policies:

  • Portal β†’ Azure AD β†’ Security β†’ Conditional Access
  • May be blocking from certain locations/devices

Common AADSTS error codes:

  • AADSTS50126: Invalid username or password
  • AADSTS50076: MFA required
  • AADSTS50053: Account locked
  • AADSTS700016: Application not found in directory

"ResourceNotFound" or "NotFound" (404 errors)

What it means: Resource doesn't exist.

Causes:

  • Resource was deleted
  • Wrong resource group/subscription
  • Wrong region
  • Typo in resource name

Quick fixes:

1. Verify resource exists:

# List all resources in subscription
az resource list --output table

# Search for specific resource
az resource list --query "[?name=='MyResource']"

# Check specific resource group
az resource list --resource-group MyRG --output table

2. Check subscription context:

# Show current subscription
az account show

# List all subscriptions
az account list --output table

# Switch subscription
az account set --subscription "My Subscription"

3. Check resource group:

  • Resource may be in different RG than expected
  • Portal β†’ Resource groups β†’ Browse all

"StorageAccountAlreadyTaken"

What it means: Storage account name already in use.

Causes:

  • Storage account names are globally unique
  • Someone else using that name
  • You deleted account (name reserved 24-48 hours)

Quick fixes:

1. Choose different name:

  • Add random suffix: mystorageacct12345
  • Use company/project prefix

2. Check name availability:

az storage account check-name --name mystorageacct

3. Wait if recently deleted:

  • Names reserved up to 48 hours after deletion
  • Use different name meanwhile

Naming rules:

  • 3-24 characters
  • Lowercase letters and numbers only
  • Globally unique across all Azure

"NetworkSecurityGroupCannotBeAttachedToGatewaySubnet"

What it means: NSG not allowed on gateway subnet.

Causes:

  • Trying to attach NSG to subnet containing VPN/ExpressRoute gateway
  • Azure restriction for gateway subnets

Quick fixes:

  1. Don't attach NSG to gateway subnet (by design)
  2. Use NSG on other subnets
  3. Use Azure Firewall for gateway subnet security

Note: This is expected behavior, not a bug.


"PublicIPAddressCannotBeDeleted" or resource locked

What it means: Resource can't be deleted while in use.

Causes:

  • Resource attached to another resource (e.g., NIC, load balancer)
  • Resource locked explicitly
  • Resource in use by service

Quick fixes:

1. Check resource dependencies:

  • Portal β†’ Resource β†’ Overview β†’ See what it's attached to
  • Must detach/delete dependent resources first

2. Check for locks:

# List locks on resource
az lock list --resource-group MyRG

# Delete lock
az lock delete --name MyLock --resource-group MyRG

3. Deletion order (example for VM):

  • Stop VM
  • Delete VM
  • Delete network interface
  • Delete public IP
  • Delete virtual network
  • Delete resource group

"DeploymentFailed" (ARM template / App Service)

What it means: Deployment error.

Causes:

  • ARM template syntax error
  • Invalid parameter values
  • Quota exceeded
  • Dependency failure
  • App Service configuration issue

Quick fixes:

1. Check deployment logs:

  • Portal β†’ Resource group β†’ Deployments β†’ Failed deployment β†’ Error details

2. Validate ARM template:

az deployment group validate \
  --resource-group MyRG \
  --template-file template.json \
  --parameters @parameters.json

3. Check specific error message:

For App Service:

  • Check deployment logs: Portal β†’ App Service β†’ Deployment Center β†’ Logs
  • Verify build succeeded
  • Check app settings/connection strings
  • Review Kudu logs: https://<app-name>.scm.azurewebsites.net

"Function execution timeout" (Azure Functions)

What it means: Function took too long to execute.

Causes:

  • Consumption plan timeout (default 5 minutes)
  • Long-running operation
  • External API slow
  • Cold start delay

Quick fixes:

1. Check timeout setting:

  • Portal β†’ Function App β†’ Configuration β†’ Application settings
  • functionTimeout setting (Consumption: max 10 min, Premium/Dedicated: unlimited)

2. Increase timeout (if on Premium/Dedicated plan):

// host.json
{
  "functionTimeout": "00:10:00"
}

3. Optimize function:

  • Reduce external API calls
  • Use async/await properly
  • Cache data when possible
  • Break into smaller functions

4. Upgrade plan:

  • Consumption β†’ Premium (no timeout limit)
  • Use Durable Functions for long-running workflows

"Storage account access denied" or "403 Forbidden"

What it means: Don't have permission to access storage.

Causes:

  • SAS token expired
  • Firewall blocking your IP
  • RBAC permissions insufficient
  • Public access disabled

Quick fixes:

1. Check firewall rules:

  • Portal β†’ Storage account β†’ Networking
  • Add your IP to allowed list
  • Or enable "Allow access from all networks" (testing only)

2. Verify SAS token:

# Generate new SAS token
az storage account generate-sas \
  --account-name mystorageacct \
  --services b \
  --resource-types co \
  --permissions r \
  --expiry 2026-12-31

3. Check RBAC:

  • Portal β†’ Storage account β†’ Access Control (IAM)
  • Verify you have "Storage Blob Data Reader" or similar role

4. Check public access:

  • Portal β†’ Storage account β†’ Configuration β†’ Allow Blob public access
  • Must be enabled for anonymous access

Quick Fixes: Azure Not Working?

Fix #1: Clear Azure Portal Cache

Why it works: Cached portal data can cause errors.

How to clear:

  1. Azure Portal β†’ Settings (gear icon) β†’ Sign out all other sessions
  2. Clear browser cache (Ctrl+Shift+Del / Cmd+Shift+Del)
  3. Try incognito/private mode
  4. Hard refresh: Ctrl+Shift+R (Windows) / Cmd+Shift+R (Mac)

Portal-specific cache:

  • Portal β†’ Settings β†’ Reset all settings
  • Restores portal to defaults

Fix #2: Check Subscription Status and Credits

Subscription issues are common.

How to check:

  1. Go to Azure Account Center
  2. Verify subscription status: "Active"
  3. Check payment method valid
  4. Check credits remaining (for free trial/MSDN)

Fix payment issues:

  • Update credit card
  • Pay outstanding invoices
  • Remove spending limit (if applicable)

Fix #3: Verify Region and Service Availability

Not all services available in all regions.

Check service availability:

Example:

  • Some VM sizes only in specific regions
  • Azure Bastion not in all regions

Solution:

  • Deploy to region with service availability
  • Or request service expansion (limited cases)

Fix #4: Use Azure CLI/PowerShell as Backup

Portal down? Use command line.

Azure CLI:

# Install Azure CLI
# macOS: brew install azure-cli
# Windows: Download from https://aka.ms/installazurecliwindows

# Login
az login

# Create resource group
az group create --name MyRG --location eastus

# Create VM
az vm create \
  --resource-group MyRG \
  --name MyVM \
  --image UbuntuLTS \
  --admin-username azureuser \
  --generate-ssh-keys

Azure PowerShell:

# Install Azure PowerShell
Install-Module -Name Az -AllowClobber -Scope CurrentUser

# Login
Connect-AzAccount

# Create resource group
New-AzResourceGroup -Name MyRG -Location "East US"

Pro tip: Learn CLI basicsβ€”portal is convenient, but CLI is faster and scriptable.


Fix #5: Check Resource Locks

Locks prevent accidental deletion/modification.

Check for locks:

# List all locks in subscription
az lock list --output table

# List locks on specific resource group
az lock list --resource-group MyRG --output table

Remove lock (if appropriate):

az lock delete --name MyLock --resource-group MyRG

Lock types:

  • ReadOnly: Can view, but can't modify or delete
  • CanNotDelete: Can modify, but can't delete

Common scenario:

  • Production resources often locked by governance policy
  • Contact admin to unlock temporarily

Fix #6: Review Activity Log

Activity log shows what happened.

Check activity log:

  • Portal β†’ Resource β†’ Activity log
  • Filter by time range and operation
  • Look for failed operations

Via CLI:

# Get activity log for resource group
az monitor activity-log list \
  --resource-group MyRG \
  --max-events 20 \
  --output table

What to look for:

  • Who made changes (correlation ID)
  • What failed (error messages)
  • When it happened (timestamp)

Fix #7: Check Service Health and Planned Maintenance

Azure announces planned maintenance.

Check Service Health:

  • Portal β†’ Service Health β†’ Planned maintenance
  • See upcoming maintenance windows
  • Can affect VM availability

RDP/SSH unavailable during maintenance:

  • VMs may reboot
  • Plan accordingly
  • Use availability sets/zones for HA

Fix #8: Restart or Redeploy Resource

Turn it off and on again.

Restart VM:

# Restart (keeps allocation)
az vm restart --name MyVM --resource-group MyRG

# Stop (deallocate) and start (new allocation)
az vm deallocate --name MyVM --resource-group MyRG
az vm start --name MyVM --resource-group MyRG

Restart App Service:

az webapp restart --name MyApp --resource-group MyRG

Restart Function App:

az functionapp restart --name MyFunctionApp --resource-group MyRG

When to restart:

  • Unresponsive service
  • After configuration change
  • Random errors
  • Performance degradation

Azure Portal Not Working?

Issue: Portal Loading Forever or "Unexpected error occurred"

Causes:

  • Browser cache corrupted
  • Browser extension interference
  • Network/proxy issue
  • Portal outage (rare)

Troubleshoot:

1. Try incognito/private mode:

  • Bypasses cache and extensions
  • If works, cache/extension is the issue

2. Clear browser cache:

  • Chrome: Settings β†’ Privacy β†’ Clear browsing data
  • Edge: Settings β†’ Privacy β†’ Choose what to clear
  • Firefox: Settings β†’ Privacy β†’ Clear Data

3. Disable browser extensions:

  • Ad blockers can interfere
  • Try disabling all extensions

4. Try different browser:

  • Chrome, Edge, Firefox, Safari

5. Check network:

  • Disable VPN
  • Try different network
  • Check firewall/proxy

6. Use Azure CLI:

  • Bypass portal entirely if down

Issue: Can't Find Resource in Portal

Causes:

  • Wrong subscription selected
  • Resource in different resource group
  • Resource deleted
  • No permissions to view

Troubleshoot:

1. Search all resources:

  • Portal β†’ Search bar (top) β†’ Type resource name
  • Shows resources across all subscriptions

2. Check subscription filter:

  • Portal β†’ Settings (gear) β†’ Directories + subscriptions
  • Verify correct subscriptions selected

3. Check resource group:

  • Portal β†’ Resource groups β†’ Browse all
  • Look for resource

4. Use Azure CLI:

# Search all subscriptions
az account list --output table
az account set --subscription "My Subscription"

# Find resource
az resource list --name MyResource --output table

Azure Virtual Machines Not Working?

Issue: Can't RDP or SSH to VM

Causes:

  • VM not running
  • NSG blocking port 3389/22
  • Public IP not assigned
  • VM agent not running
  • Password incorrect

Troubleshoot:

1. Check VM status:

az vm get-instance-view --name MyVM --resource-group MyRG --query instanceView.statuses

2. Start VM if stopped:

az vm start --name MyVM --resource-group MyRG

3. Check NSG rules:

# List NSG rules
az network nsg rule list --nsg-name MyNSG --resource-group MyRG --output table

# Add RDP rule (port 3389)
az network nsg rule create \
  --nsg-name MyNSG \
  --resource-group MyRG \
  --name AllowRDP \
  --priority 1000 \
  --source-address-prefixes '*' \
  --destination-port-ranges 3389 \
  --access Allow \
  --protocol Tcp

4. Check public IP:

# Get VM public IP
az vm show --name MyVM --resource-group MyRG --show-details --query publicIps -o tsv

5. Reset password:

az vm user update \
  --resource-group MyRG \
  --name MyVM \
  --username azureuser \
  --password NewP@ssw0rd123

6. Use Serial Console (emergency access):

  • Portal β†’ VM β†’ Support + troubleshooting β†’ Serial console
  • Works even if network broken

Issue: VM Running Slow or Unresponsive

Causes:

  • High CPU/memory usage
  • Disk throttling (I/O limits)
  • VM size too small
  • Software issue

Troubleshoot:

1. Check metrics:

  • Portal β†’ VM β†’ Metrics
  • Check CPU, memory, disk IOPS, network

2. Resize VM:

# List available sizes
az vm list-sizes --location eastus --output table

# Resize VM (requires restart)
az vm resize --resource-group MyRG --name MyVM --size Standard_D4s_v3

3. Check disk performance:

  • Standard HDD: Low IOPS (500 IOPS)
  • Standard SSD: Medium IOPS (500-6000 IOPS)
  • Premium SSD: High IOPS (120-20000 IOPS)

4. Upgrade disk:

az disk update --resource-group MyRG --name MyDisk --sku Premium_LRS

Azure App Service Not Working?

Issue: App Service Not Starting or "503 Service Unavailable"

Causes:

  • Application error on startup
  • Configuration issue
  • Insufficient App Service Plan size
  • Deployment failed

Troubleshoot:

1. Check application logs:

  • Portal β†’ App Service β†’ Log stream
  • Or download logs: Monitoring β†’ App Service logs

2. Check Kudu console:

  • Navigate to https://<app-name>.scm.azurewebsites.net
  • Debug Console β†’ Check logs under LogFiles

3. Verify deployment succeeded:

  • Portal β†’ Deployment Center β†’ Logs
  • Check for build/deploy errors

4. Check app settings:

  • Portal β†’ Configuration β†’ Application settings
  • Verify connection strings correct
  • Check environment variables

5. Scale up App Service Plan:

  • Portal β†’ App Service Plan β†’ Scale up
  • Upgrade to higher tier if running out of resources

Issue: Deployment Failing

See "DeploymentFailed" error section above.

Additional checks:

  • Verify source control credentials
  • Check build logs
  • Test locally first
  • Review deployment slots (use staging slot)

Azure Functions Not Working?

Issue: Function Not Triggering

Causes:

  • Trigger configuration incorrect
  • Function disabled
  • Binding issue
  • Permission issue (e.g., storage account access)

Troubleshoot:

1. Check function status:

  • Portal β†’ Function App β†’ Functions β†’ Your function
  • Verify "Enabled"

2. Check trigger configuration:

  • HTTP trigger: Correct HTTP method? Authorization level?
  • Timer trigger: CRON expression correct?
  • Queue trigger: Storage account accessible?

3. Test manually:

  • Portal β†’ Function β†’ Code + Test β†’ Run
  • See immediate error messages

4. Check application logs:

  • Portal β†’ Function App β†’ Monitor β†’ Logs

5. Verify storage account connection:

  • Function Apps require storage account
  • Check connection string valid

Azure Storage Not Working?

Issue: Can't Upload or Download Blobs

See "Storage account access denied" error section above.

Additional checks:

  • Check storage account firewall
  • Verify SAS token not expired
  • Check CORS settings (for browser uploads)
  • Verify connection string correct

When Azure Actually Goes Down

What Happens

Recent major outages:

  • July 2024: Global Azure outage (DDoS attack on Azure infrastructure) - 10+ hours
  • January 2024: Azure AD outage (authentication failures) - 4 hours
  • September 2023: West Europe region outage (power issues) - 6 hours

Typical causes:

  1. Datacenter infrastructure failures (power, cooling, network)
  2. Azure AD/authentication platform issues
  3. Regional outages (weather, power grid)
  4. Software deployment bugs
  5. DDoS attacks
  6. Cascading failures

Impact:

  • Portal inaccessible
  • VMs unreachable
  • Services stopped
  • Authentication failures
  • Data temporarily unavailable (but not lost)

How Microsoft Responds

Communication channels:

Timeline:

  1. 0-30 min: Users report issues on Twitter/DownDetector
  2. 30-90 min: Microsoft posts investigating message
  3. 90-180 min: Regular updates (every 30-60 min)
  4. Resolution: Usually 2-12 hours for major outages
  5. Post-incident review (PIR): Posted to Service Health within 2 weeks

What to Do During Outages

1. Implement failover (if multi-region):

  • Traffic Manager: Automatic failover
  • Manual: Update DNS to secondary region
  • Activate DR (disaster recovery) plan

2. Communicate status:

  • Update status page
  • Email customers proactively
  • Tweet/social media updates

3. Monitor status:

4. Document impact:

  • Screenshot errors
  • Note affected resources
  • Track downtime duration
  • Use for SLA credit request

5. Don't make changes:

  • Wait for resolution
  • Don't try to "fix" during outage (may make worse)
  • Don't delete/recreate resources

Azure SLA credits:

  • VMs: 99.9% (single instance), 99.95% (availability set)
  • Storage: 99.9%
  • If SLA breached, request credit: Portal β†’ Help + support β†’ Service request

Azure Down Checklist

Follow these steps in order:

Step 1: Verify it's actually Azure

Step 2: Isolate the issue

  • Check if specific service or all Azure
  • Check if regional or global
  • Try Azure Portal in incognito/different browser
  • Try Azure CLI (bypass portal)

Step 3: Quick fixes (if Azure is up)

  • Clear browser cache and try portal again
  • Check subscription status (active? payment method valid?)
  • Verify signed in to correct account
  • Check quota limits
  • Review activity log for failed operations

Step 4: Service-specific troubleshooting

  • VMs: Check if running, verify NSG rules, check public IP
  • App Service: Check logs, verify deployment succeeded
  • Functions: Check trigger config, test manually
  • Storage: Check firewall, verify SAS token, check RBAC

Step 5: Advanced troubleshooting

  • Check resource locks
  • Review Service Health for planned maintenance
  • Restart/redeploy resource
  • Try different region (if possible)
  • Check for underlying service dependencies (e.g., Azure AD for auth)

Step 6: Contact support (if still not working)

  • Create support request: Portal β†’ Help + support β†’ New support request
  • Include: Subscription ID, resource names, error messages, correlation IDs
  • For production outages: Use Severity A (critical)

Prevent Future Issues

1. Implement Multi-Region Architecture

Don't put all eggs in one basket.

Best practices:

  • Deploy critical apps to 2+ regions
  • Use Azure Traffic Manager for automatic failover
  • Replicate storage (GRS/RA-GRS)
  • Test failover regularly

Example architectures:

  • Active-active: Traffic split between regions
  • Active-passive: Failover to secondary only when primary down

2. Set Up Azure Monitor and Alerts

Know about issues before customers do.

Key monitors:

  • VM availability and performance
  • App Service response times
  • Function execution failures
  • Storage account throttling

Create alerts:

# Create alert for VM CPU > 80%
az monitor metrics alert create \
  --name HighCPU \
  --resource-group MyRG \
  --scopes /subscriptions/.../resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
  --condition "avg Percentage CPU > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action-group MyActionGroup

3. Implement Auto-Scaling

Handle load spikes automatically.

For App Service:

  • Portal β†’ App Service Plan β†’ Scale out
  • Set rules: CPU > 70% β†’ add instance
  • Set max instances (budget control)

For VMs:

  • Use Virtual Machine Scale Sets (VMSS)
  • Auto-scale based on CPU, memory, or custom metrics

4. Use Azure Backup and Site Recovery

Protect against data loss.

Azure Backup:

  • VMs: Automatic backups
  • Files: Azure Files backup
  • Databases: SQL backup

Azure Site Recovery (ASR):

  • VM replication to secondary region
  • Automated failover
  • RTO: 2-4 hours, RPO: 5 minutes

5. Monitor Service Health and Subscribe to Alerts

Be proactive.

Set up Service Health alerts:

  • Portal β†’ Service Health β†’ Health alerts β†’ Add service health alert
  • Filter by services you use
  • Get notified of incidents affecting your resources

6. Implement Infrastructure as Code (IaC)

Recreate resources quickly.

Tools:

  • ARM templates (Azure-native)
  • Terraform (multi-cloud)
  • Bicep (ARM simplified)

Benefits:

  • Version control infrastructure
  • Quick disaster recovery (redeploy from code)
  • Consistent environments

7. Review and Optimize Costs

Avoid surprise shutdowns due to budget.

Cost management:

  • Portal β†’ Cost Management + Billing
  • Set budgets and alerts
  • Right-size resources (don't over-provision)
  • Use reserved instances for predictable workloads
  • Stop dev/test resources when not in use

8. Keep Access Credentials Secure and Updated

Avoid lockouts.

Best practices:

  • Use Azure Key Vault for secrets
  • Rotate service principal credentials regularly
  • Use managed identities (no credentials to manage)
  • Enable MFA for admin accounts
  • Review and remove stale service principals

Key Takeaways

Before assuming Azure is down:

  1. βœ… Check Azure Status
  2. βœ… Check Service Health in Portal
  3. βœ… Check API Status Check
  4. βœ… Search Twitter for "Azure down"
  5. βœ… Try Azure CLI (bypass portal)

Common fixes:

  • Clear browser cache (portal issues)
  • Check subscription status and payment method
  • Verify quota limits (common blocker)
  • Check regional availability (not all services in all regions)
  • Restart or redeploy resource
  • Review activity log for specific error details

Configuration issues (NOT Azure down):

  • "Subscription disabled" = payment/billing issue
  • "Quota exceeded" = hit limits, request increase
  • "Allocation failed" = try different region/VM size
  • "Authentication failed" = verify credentials, check Azure AD
  • "ResourceNotFound" = verify subscription, resource group, region

VM issues:

  • Can't RDP/SSH = check NSG rules, verify VM running, check public IP
  • VM slow = check metrics, resize VM, upgrade disk tier
  • Start failed = allocation issue, try different region

App Service / Functions issues:

  • 503 errors = check logs, verify deployment, check app settings
  • Deployment failed = review logs, validate configuration
  • Function not triggering = check trigger config, test manually

If Azure is actually down:

  • Implement failover to secondary region (if multi-region setup)
  • Communicate with customers proactively
  • Monitor status page for updates
  • Document impact for SLA credit request
  • Don't make changes during outage

Prevent future issues:

  • Implement multi-region architecture
  • Set up Azure Monitor and alerts
  • Use auto-scaling for resilience
  • Enable Azure Backup and Site Recovery
  • Subscribe to Service Health alerts
  • Use Infrastructure as Code (ARM/Terraform)
  • Monitor costs and set budgets
  • Use managed identities and Key Vault

Remember: Most "Azure down" issues are configuration errors, quota limits, or subscription problemsβ€”not actual Azure outages. Check subscription status, quotas, and resource-specific logs before assuming platform outage.


Need real-time Azure status monitoring? Track Azure uptime with API Status Check - Get instant alerts when Azure goes down.


Related Resources

Monitor Your APIs

Check the real-time status of 100+ popular APIs used by developers.

View API Status β†’