Datadog Outage History
Past incidents and downtime events
Complete history of Datadog outages, incidents, and service disruptions. Showing 50 most recent incidents.
January 2026(4 incidents)
Delayed Distribution Monitors Evaluations
6 updates
This incident has been resolved.
We are continuing to monitor the fix and will continue to provide regular updates.
We have deployed a fix and we are monitoring the results. We will continue to provide regular updates.
We are continuing to work on a fix for this issue. It is important to note that no data has been lost, and evaluations will be caught up once the service is operational again.
We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and evaluations will be caught up once the service is operational again.
We are investigating delays in Monitors evaluations, which began at 17:15 UTC.
Monitors - Delayed Evaluation
5 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are investigating delays in service checks monitors evaluation, which began at 20:26 1/28/2026 UTC.
Web Application Not Loading
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.
Delayed Events
4 updates
This incident is resolved. There's no more delay for the processing of Events, nor impact on the event stream, event based widgets and event based monitors.
Recovery is in progress and the new estimated time of recovery would be 14h30 UTC.
We have identified the issue and scaled up for recovery, with a recovery estimated to be around 14h30 UTC. We'll continue to give updates as recovery progresses.
We are investigating increased latency processing Events. As a result of this issue, some users may see delays or gaps in the event stream or for event based widgets or event based monitors.
December 2025(3 incidents)
Delayed Processes data
5 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are investigating increased latency processing Processes data. As a result of this issue, some users may see delays or gaps for data based on Process Monitoring. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Delayed APM metric ingestion
5 updates
All impact related to APM metrics has been resolved. A separate incident has been created to track the remaining impact in live process data.
We have identified the issue affecting ingestion delays in apm and process metrics and are working on recovery
We are currently investigating lag in ingesting apm and process metrics, which affects monitor evaluation and in some cases led to incorrect monitor alerts.
We are continuing to investigate this issue.
We are currently investigating lag in ingesting apm metrics, which affects monitor evaluation.
Metrics data ingestion delayed and monitor evaluations degraded
4 updates
This incident has been resolved. Live data is being processed normally and gaps in distribution metrics on graphs will be backfilled within the next hour.
Live distribution metrics are available and being evaluated for all monitors. Gaps in graphs from the beginning of the incident are in the process of being backfilled.
The issue has been identified and a fix is being implemented.
We’re currently monitoring an issue causing delays in distribution metric processing in our US1 region.
November 2025(5 incidents)
Web Application Not Loading
4 updates
This incident has been resolved as of 2:32PM ET.
A fix has been implemented and we are monitoring the results.
We continue investigating the issue with web application. Data processing and alerting remain operational.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application.
Delayed Monitors Notifications
2 updates
This incident has been resolved. Notification delays were only affecting our internal monitoring and were due to the ongoing Cloudflare incident: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7/.
We are investigating delays in RUM-based Monitors Notifications, which began at 11:30am UTC.
Dashboards Not Loading
4 updates
All errors stopped as of 12:02ET. This incident has been resolved.
The rollout with a fix is in progress, and we're no longer seeing errors, and are currently monitoring the incident and we are on the path of recovery.
The issue has been identified and we taking measures to mitigate the issue, as well as working on a fix.
We are investigating loading issues on the dashboard pages. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.
Web Application Not Loading
2 updates
This incident has been resolved.
We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.
Delayed Metrics for APM and distribution metrics
4 updates
APM metrics are now processing live.
The distribution metrics should be back to processing live, without latency. APM metrics are still being delayed, and we're actively working at getting it back to live.
We have identified the root cause, and scaled up the processing to catch up with the lag.
We are investigating increased latency processing Metrics from APM and distribution metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
October 2025(5 incidents)
Metrics data ingestion delayed and monitor evaluations degraded
2 updates
This incident has been resolved.
The issue has been identified and a fix is being implemented.
Multiple products impacted with data delays
17 updates
Backfills for Metrics and Log Management data have completed. All systems are back to normal.
We are making progress on outstanding backfills. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 16:00 UTC.
We are making progress on outstanding backfills. Cloud Cost Monitoring backfill is complete. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 10:00 UTC.
We are continuing the work on outstanding backfills which are not yet fully complete, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 22:00 UTC.
All products have been stable since the last update. We are continuing the work on outstanding backfills, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 16:00 UTC.
We are seeing recovery across all of our products, and live data and monitor evaluations have resumed for all affected products. Most historical data in Logs has been backfilled and we have a small number of ongoing backfills in Metrics and other products. We will continue to monitor the situation overnight, and our next update will be 09:00 UTC.
We are seeing recovery for APM. We continue to see delays in processing that impact the following products: Distribution Metrics, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.
Logs data have been backfilled, and users should no longer see gaps in their historical logs. Log Archives and Log Forwarding were paused between 15:00 and 18:30 UTC, and we are working to re-forward any logs from that time period. We continue to see delays in processing that impact the following products: Distribution Metrics, APM, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.
We are seeing recovery in Profiling. Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress. In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.
We are seeing recovery in AWS Metrics. Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress. In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, Profiling, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.
We are seeing progress in telemetry data coming from AWS into Datadog. We are starting to see our capacity requests being fulfilled more slowly than usual. App Builder and Workflow Automation are seeing recovery. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well.
We are seeing progress in telemetry data coming from AWS into Datadog. Also, we are starting to see our capacity requests being fulfilled. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute.
APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and for all products except RUM we expect the data will be backfilled once the service is fully operational again. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute. Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.
APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and for all products except RUM we expect the data will be backfilled once the service is fully operational again. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute. Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.
We are still seeing increased latency processing for those products and the associated monitors are delayed. We are continuing to work on bringing new capacity online and will continue to provide updates on this issue.
We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. Monitors using the impacted data are delayed. We are working on bringing new capacity online and will provide an update once the service is fully operational again.
We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and the data will be backfilled once the service is fully operational again.
Multiple products impacted with data delays
3 updates
This incident has been resolved.
We are monitoring and seeing recovery for all products, some customers might still experience for a limited subset of data delays for logs or host vulnerability scanning specific to AWS us-east-1. We will post specific information on the affected product pages for those customers. On-Call notifications are fully operational.
Note: this is a delayed update because this incident impaired our ability to update the status page, we posted banners earlier in the product to let customers know about the ongoing impact. We are still seeing some delays as we are fully recovering from the underlying incident: agentless vulnerability scanning for hosts in AWS us-east-1 is still delayed, On-Call notifications are not fully recovered. This incident started at 07:10 UTC on October 20. So far we have recovered fully from the impact on Synthetics, collection of data from AWS, Bits AI, Codegen, Dashboards (edition features were impaired).
Delayed AWS, GCP, Azure, SaaS integrations Metrics and Logs
5 updates
This incident has been resolved.
Data flow has been restored for new incoming data. We are currently backfilling historical data.
A fix has been implemented, and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing AWS, GCP and Azure Metrics. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Delayed Metrics
4 updates
We’ve confirmed that this issue only impacts customers using the OCI integrations feature. The vast majority of customers are not impacted. Impacted customers will see an in-app banner when visiting any Datadog product page. The banner will be removed once the issue is resolved. Since the impact is localized, we are closing the status page.
We are continuing to investigate this issue.
We are investigating increased latency processing Processing for a Subset of Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
September 2025(5 incidents)
Host Tags, Service Checks, and Datadog Events Delayed Evaluation
4 updates
This incident has been resolved.
Service Check and Datadog Events monitor evaluation has recovered and data is up-to-date. Host tag updates are still recovering and stale host tags may appear in the frontend.
The issue has been identified and a fix is being implemented.
We’re currently investigating an issue causing delayed processing of host tag updates, Service Checks and Datadog Events, which may result in stale data appearing in the frontend. Our team is actively working to mitigate and fully resolve this. I’ll follow up as soon as the issue has been resolved. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
[SSO] Login Errors from Google SSO
3 updates
This incident has been resolved.
We are seeing recovery in Google SSO logins. We are continuing to monitor for issues.
We are investigating user login issues with the web application via Google SSO. Please note that data processing and alerts are not affected by this incident.
Delayed Metrics
3 updates
This incident has been resolved.
We have deployed a fix. The impact is limited to APM Metrics, we are monitoring and we will provide another update once the issue is fully resolved
We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
Delayed Monitors Notifications
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating delays in Monitors Notifications for distribution metrics, which began at 3PM UTC.
Delayed RUM data
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating increased latency processing RUM data. As a result of this issue, some users may see gaps or delays in RUM graphs as well as empty or partial query results on RUM Sessions, RUM Analytics, RUM Application, and Error Tracking pages.
August 2025(5 incidents)
Periodic network interruption communicating with multiple Azure regions
5 updates
Our monitoring has shown Azure’s fix to be stable since our last update. This incident has been resolved.
Azure has implemented a permanent fix to the network issue. Both Azure and Datadog engineers are continuing to monitor overnight and will provide an update tomorrow.
Azure has temporarily mitigated the network capacity issues which have caused episodic packet loss for customers who are hosted in Azure data centers and are using Datadog’s US1 region (accessible via https://app.datadoghq.com). Azure engineers are continuing to work to fully resolve this issue. Until they fully resolve the issue, customers with Datadog agents running in Azure data centers may see brief periods of delayed ingestion of data from agents and from Azure integrations. We don’t expect a noticeable impact thanks to agent buffering but cannot exclude the possibility of spurious alerts due to temporarily delayed data. We are continuing to monitor the situation in conjunction with Azure, and will do so throughout the weekend. We will post status page updates as soon as the situation improves, and at least every 24 hours. We thank you for your patience throughout this incident.
The root cause of the issue has been identified and Microsoft has implemented mitigations, we are monitoring network traffic to confirm.
Degraded network capacity in an Azure datacenter is causing network packet loss and increased latency when communicating with AWS in eastern US regions. Customers may experience communications failures trying to submit data from agents running in AWS and may experience delayed data from AWS integrations. Microsoft has identified the root cause and is working on mitigations.
Pagerduty Monitor Notifications Delayed
3 updates
PagerDuty notifications deliveries are back to normal.
We are observing some recovery of notifications delays and continue to monitor the situation. Please follow our integration status page for details https://datadogintegrations.statuspage.io/
Monitor Notifications are delayed for Pagerduty.
Partial metrics drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter
2 updates
We noticed partial data drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter. There is no data drop anymore, we are monitoring the situation.
We noticed partial metrics drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter. We are actively investigating the case.
Duplicate Logs in Aggregated Queries
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating an issue processing Logs. As a result of this issue, some users may see inconsistencies in logs queries.
Degraded Web Application Degraded
4 updates
This incident has been resolved.
We have identified the issue and implemented a fix, we are monitoring the recovery of the impacted products.
We are continuing to investigate this issue.
Due to an issue with access controls failures we're seeing downstream impact to multiple products. Our team is actively working on identifying root cause and to resolve the issue. We will be providing a more specific update shortly.
July 2025(3 incidents)
Google SSO login errors
3 updates
This incident has been resolved.
Google declared an incident regarding this issue: https://www.google.com/appsstatus/dashboard/incidents/oFcAZTr4EVieF5Fr6Ee9
We are investigating user login issues with the web application via Google SSO. Please note that data processing and alerts are not affected by this incident.
Degraded Web Application Performance & Monitor Evaluations
4 updates
This incident has been resolved.
We've implemented a fix and we're seeing recovery in monitor evaluations and dashboards, we'll continue to investigate and monitor for further impact
We've identified a possible root cause and we're actively working on mitigating the impact
We're investigating an issue with our metrics and monitor evaluations, causing degraded web application performance and skipped monitors
Monitors - Delayed Evaluation of logs monitors
3 updates
This incident has been resolved.
The team rolled out a change and has been seeing recovery. The team will continue monitoring for a period of time.
We are investigating delays in Monitors Evaluation of logs based monitors., which began at 01:30:00 PM UTC.
June 2025(3 incidents)
Logs Monitors - Delayed Evaluations
3 updates
This incident has been resolved.
The issue has been identified and a fix is being implemented.
We are investigating delays in Logs Monitors Evaluations, which began at 8:46 PM UTC.
Delayed processing of APM Trace Metrics
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating delayed processing of APM Trace metrics starting around 21:40 UTC. Dashboards and monitors relying on these metrics are affected.
Elevated error rates in queries across multiple products
4 updates
This incident has been resolved.
All impact to query systems has recovered, note that during this incident data intake and alerting have not been impacted. We are continuing to monitor the status of the fix.
The issue has been identified and a fix is being implemented.
We are actively investigating issues querying data affecting multiple products. As a result of this issue, there might be errors when trying to load data from queries on different pages of the web application or through the API.
May 2025(3 incidents)
Monitors - Delayed Evaluation
3 updates
This incident has been resolved.
The issue has been identified and a fix is being implemented.
We are investigating delays in Distribution Monitors Evaluation, which began at 5:30pm UTC. Monitors for other types of metrics are evaluating as usual.
Delayed Traces and Spans in APM
5 updates
The incident is now resolved. APM trace ingestion and all downstream systems, including monitors, have fully recovered and are up to date.
We are monitoring a fix with to increased latency processing in APM Metrics. APM data in live view is current but distributed tracing metrics are delayed by 20 minutes. Monitors sourced from the data are impacted until the data becomes current.
As a result of the issue we are monitoring delays in Monitors Evaluation
A fix has been implemented and we are monitoring the results.
We are investigating increased latency processing Traces and Spans in APM As a result of this issue, some users may see missing or delayed traces and Spans starting at 18:33 UTC.
Delayed AWS Metrics and Events
4 updates
This incident has been resolved.
A fix has been implemented and recovery is in progress. To prevent spurious alerts, monitors on AWS Metrics and Events remain disabled until recovery is complete.
The issue has been identified and a fix is being implemented.
We are investigating increased latency processing AWS metrics and events. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics and events. To prevent spurious alerts, we have temporarily disabled monitors based on this data.
April 2025(1 incident)
Monitors - Delayed Evaluation
3 updates
This incident has been resolved.
The incident has fully recovered. The service is now fully operational.
We are investigating delays in Monitors Evaluation, which began at 12:45 UTC.
March 2025(2 incidents)
Delayed processing of APM Trace Metrics
5 updates
This incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are investigating delayed processing of APM Trace metrics starting around 07:00 UTC. Dashboards and monitors relying on these metrics are affected.
Login Issues
4 updates
This incident has been resolved.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.
February 2025(1 incident)
Delayed Processing for a Subset of Metrics
7 updates
This incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost: data is being backfilled and will be available once the service is operational again.
We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost: data is being backfilled and will be available once the service is operational again.
We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.
We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.
We are investigating increased latency processing Trace Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs and statistics on Service Catalog.
January 2025(3 incidents)
Degraded Web Application Performance
5 updates
This incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We have identified the underlying issue and are continuing to work on a fix. Degraded web application performance is primarily observed in customers with low network bandwidth.
We have identified the underlying issue and are working on a fix.
We are investigating degraded performance with the web application.
Increased delay processing events
6 updates
This incident has been resolved.
We are continue to monitor the progress of processing the backlog in Events. The majority of the backlog has been processed. Event Monitor evaluation remains delayed while we finish processing the backlog.
We've implemented a fix, and are currently working through the backlog of delayed Events. Event Monitor evaluation remains delayed while we work through the backlog. All other monitor types have recovered and are currently evaluating.
We have identified the issue causing delayed ingestion of Events. Alerting evaluation continues to be delayed for Event Monitors, Process Monitors, and Cloud Network monitors. All other monitor types have recovered and are currently evaluating.
We are continuing to investigate this issue.
We are investigating increased latency processing Events. As a result of this issue, some users may see delays in the event stream or for event queries on dashboards, and event alert evaluation is delayed. This issue also caused a delay in the processing of alerts across other products. We've implemented a fix for this, and are monitoring the recovery of the alert evaluation pipeline. As a result, a subset alerts may be delayed while the system recovers.
APM connections retrying
4 updates
This incident has been resolved.
We have mitigated the cause of transient agent submission errors for APM and customers should no longer observe these errors. The Datadog Agent automatically retries these errors and succeeded on retry; this incident did not result in any data loss
The issue has been identified and a fix is being implemented.
Some US1 customers experiencing degraded performance for APM. Customers may see transient errors, but these should resolve with an automatic retry from the Datadog agent.
December 2024(1 incident)
Delayed APM Distribution Metrics, Data Streams Monitoring Metrics & Monitor Notifications
6 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Data Streams Monitoring metrics and associated monitor notifications based on these metrics have recovered.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are investigating increased latency in processing APM Distribution Metrics and Data Streams Monitoring Metrics as well as monitors notifications based on these metrics, which began at 17h47 UTC. As a result of this issue, some users may see delays or gaps for these metrics on graphs, including APM pages as well as delayed monitor notifications.
November 2024(4 incidents)
Delayed APM data ingestion
3 updates
This incident has been resolved.
A fix has been implemented and systems are recovering.
We are investigating increased ingestion latency of APM data.
Monitors - Delayed Evaluation for Distribution Metric Monitors
6 updates
This incident has been resolved.
We have rolled out out a fix and all distribution monitors are up to date. We are continuing to monitor the customer experience and expect to resolve this incident in the next 30 minutes.
We are in the process of rolling out a fix that will bring all distribution monitors up to date. We will update again when the issue is resolved.
The root cause has been identified. We are working on a fix so that distribution metric monitor evaluations are up to date.
We are investigating delays in monitor evaluations for monitors based on distribution metrics, starting at 15h35UTC. This is causing a delay in notifications.
We are investigating delays in Distribution Metric Monitors Evaluation, which began at 15h35UTC.
Monitors - Delayed Evaluation
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating delays in Events-based Monitor Evaluation, which began at 16:00 UTC.
Delayed Distribution Metrics
4 updates
This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are investigating increased latency processing Distribution Metrics. As a result, some users may see delays or gaps for distribution metrics on graphs, including APM pages. Monitors based on this data may also be delayed. We have identified the problem and are actively working to resolve the issue.
October 2024(2 incidents)
Delayed distribution metrics & monitor notifications
4 updates
This incident has been resolved.
We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.
We have identified the underlying issue and are working on a fix.
We are investigating delays in distribution metrics, and on monitors notifications for monitors based on these metrics, which began at 17:40 UTC.
Delayed Distribution Metrics
5 updates
This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.
A fix has been implemented and we are monitoring the results.
We are continuing to work on a fix for this issue.
The issue has been identified and remediation steps are underway.
We are investigating increased latency processing Distribution Metrics. As a result of this issue, some users may see delays or gaps for distribution metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on distribution metrics.