Datadog Outage History

Past incidents and downtime events

Complete history of Datadog outages, incidents, and service disruptions. Showing 50 most recent incidents.

← Back to Datadog current status

January 2026(4 incidents)

criticalresolvedJan 29, 05:35 PM — Resolved Jan 29, 07:55 PM

Delayed Distribution Monitors Evaluations

6 updates

resolvedJan 29, 07:55 PM

This incident has been resolved.

monitoringJan 29, 07:09 PM

We are continuing to monitor the fix and will continue to provide regular updates.

monitoringJan 29, 06:36 PM

We have deployed a fix and we are monitoring the results. We will continue to provide regular updates.

identifiedJan 29, 06:06 PM

We are continuing to work on a fix for this issue. It is important to note that no data has been lost, and evaluations will be caught up once the service is operational again.

identifiedJan 29, 05:46 PM

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and evaluations will be caught up once the service is operational again.

investigatingJan 29, 05:35 PM

We are investigating delays in Monitors evaluations, which began at 17:15 UTC.

minorresolvedJan 28, 08:30 PM — Resolved Jan 28, 11:36 PM

Monitors - Delayed Evaluation

5 updates

resolvedJan 28, 11:36 PM

This incident has been resolved.

monitoringJan 28, 11:13 PM

A fix has been implemented and we are monitoring the results.

identifiedJan 28, 10:08 PM

The issue has been identified and a fix is being implemented.

investigatingJan 28, 09:21 PM

We are continuing to investigate this issue.

investigatingJan 28, 09:18 PM

We are investigating delays in service checks monitors evaluation, which began at 20:26 1/28/2026 UTC.

criticalresolvedJan 22, 06:51 PM — Resolved Jan 22, 07:27 PM

Web Application Not Loading

4 updates

resolvedJan 22, 07:27 PM

This incident has been resolved.

monitoringJan 22, 07:13 PM

A fix has been implemented and we are monitoring the results.

investigatingJan 22, 07:01 PM

We are continuing to investigate this issue.

investigatingJan 22, 06:51 PM

We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.

minorresolvedJan 18, 12:33 PM — Resolved Jan 18, 02:24 PM

Delayed Events

4 updates

resolvedJan 18, 02:24 PM

This incident is resolved. There's no more delay for the processing of Events, nor impact on the event stream, event based widgets and event based monitors.

identifiedJan 18, 01:26 PM

Recovery is in progress and the new estimated time of recovery would be 14h30 UTC.

identifiedJan 18, 12:35 PM

We have identified the issue and scaled up for recovery, with a recovery estimated to be around 14h30 UTC. We'll continue to give updates as recovery progresses.

identifiedJan 18, 12:33 PM

We are investigating increased latency processing Events. As a result of this issue, some users may see delays or gaps in the event stream or for event based widgets or event based monitors.

December 2025(3 incidents)

majorresolvedDec 12, 09:49 PM — Resolved Dec 12, 11:43 PM

Delayed Processes data

5 updates

resolvedDec 12, 11:43 PM

This incident has been resolved.

monitoringDec 12, 11:31 PM

A fix has been implemented and we are monitoring the results.

identifiedDec 12, 10:41 PM

The issue has been identified and a fix is being implemented.

investigatingDec 12, 09:53 PM

We are continuing to investigate this issue.

investigatingDec 12, 09:49 PM

We are investigating increased latency processing Processes data. As a result of this issue, some users may see delays or gaps for data based on Process Monitoring. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

majorresolvedDec 12, 06:37 PM — Resolved Dec 12, 09:52 PM

Delayed APM metric ingestion

5 updates

resolvedDec 12, 09:52 PM

All impact related to APM metrics has been resolved. A separate incident has been created to track the remaining impact in live process data.

identifiedDec 12, 08:53 PM

We have identified the issue affecting ingestion delays in apm and process metrics and are working on recovery

investigatingDec 12, 07:26 PM

We are currently investigating lag in ingesting apm and process metrics, which affects monitor evaluation and in some cases led to incorrect monitor alerts.

investigatingDec 12, 06:55 PM

We are continuing to investigate this issue.

investigatingDec 12, 06:37 PM

We are currently investigating lag in ingesting apm metrics, which affects monitor evaluation.

minorresolvedDec 9, 08:16 PM — Resolved Dec 9, 09:08 PM

Metrics data ingestion delayed and monitor evaluations degraded

4 updates

resolvedDec 9, 09:08 PM

This incident has been resolved. Live data is being processed normally and gaps in distribution metrics on graphs will be backfilled within the next hour.

monitoringDec 9, 09:00 PM

Live distribution metrics are available and being evaluated for all monitors. Gaps in graphs from the beginning of the incident are in the process of being backfilled.

identifiedDec 9, 08:36 PM

The issue has been identified and a fix is being implemented.

investigatingDec 9, 08:16 PM

We’re currently monitoring an issue causing delays in distribution metric processing in our US1 region.

November 2025(5 incidents)

criticalresolvedNov 19, 07:08 PM — Resolved Nov 19, 07:44 PM

Web Application Not Loading

4 updates

resolvedNov 19, 07:44 PM

This incident has been resolved as of 2:32PM ET.

monitoringNov 19, 07:37 PM

A fix has been implemented and we are monitoring the results.

investigatingNov 19, 07:28 PM

We continue investigating the issue with web application. Data processing and alerting remain operational.

investigatingNov 19, 07:08 PM

We are investigating loading issues on our web application. As a result, some users might be getting errors when loading the web application.

minorresolvedNov 18, 01:17 PM — Resolved Nov 18, 01:53 PM

Delayed Monitors Notifications

2 updates

resolvedNov 18, 01:53 PM

This incident has been resolved. Notification delays were only affecting our internal monitoring and were due to the ongoing Cloudflare incident: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7/.

investigatingNov 18, 01:17 PM

We are investigating delays in RUM-based Monitors Notifications, which began at 11:30am UTC.

majorresolvedNov 17, 04:23 PM — Resolved Nov 17, 05:20 PM

Dashboards Not Loading

4 updates

resolvedNov 17, 05:20 PM

All errors stopped as of 12:02ET. This incident has been resolved.

monitoringNov 17, 05:12 PM

The rollout with a fix is in progress, and we're no longer seeing errors, and are currently monitoring the incident and we are on the path of recovery.

identifiedNov 17, 04:40 PM

The issue has been identified and we taking measures to mitigate the issue, as well as working on a fix.

investigatingNov 17, 04:23 PM

We are investigating loading issues on the dashboard pages. As a result, some users might be getting errors when loading the web application. Please note that data processing and alerts are not affected by this incident.

minorresolvedNov 13, 06:58 PM — Resolved Nov 13, 07:07 PM

Web Application Not Loading

2 updates

resolvedNov 13, 07:07 PM

This incident has been resolved.

investigatingNov 13, 06:58 PM

minorresolvedNov 5, 03:42 PM — Resolved Nov 5, 05:57 PM

Delayed Metrics for APM and distribution metrics

4 updates

resolvedNov 5, 05:57 PM

APM metrics are now processing live.

identifiedNov 5, 04:59 PM

The distribution metrics should be back to processing live, without latency. APM metrics are still being delayed, and we're actively working at getting it back to live.

identifiedNov 5, 04:25 PM

We have identified the root cause, and scaled up the processing to catch up with the lag.

identifiedNov 5, 03:42 PM

We are investigating increased latency processing Metrics from APM and distribution metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

October 2025(5 incidents)

majorresolvedOct 28, 06:32 PM — Resolved Oct 28, 10:01 PM

Metrics data ingestion delayed and monitor evaluations degraded

2 updates

resolvedOct 28, 10:01 PM

This incident has been resolved.

identifiedOct 28, 06:32 PM

The issue has been identified and a fix is being implemented.

majorresolvedOct 20, 12:45 PM — Resolved Oct 22, 02:40 PM

Multiple products impacted with data delays

17 updates

resolvedOct 22, 02:40 PM

Backfills for Metrics and Log Management data have completed. All systems are back to normal.

monitoringOct 22, 09:10 AM

We are making progress on outstanding backfills. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 16:00 UTC.

monitoringOct 21, 09:35 PM

We are making progress on outstanding backfills. Cloud Cost Monitoring backfill is complete. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 10:00 UTC.

monitoringOct 21, 06:04 PM

We are continuing the work on outstanding backfills which are not yet fully complete, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 22:00 UTC.

monitoringOct 21, 10:20 AM

All products have been stable since the last update. We are continuing the work on outstanding backfills, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 16:00 UTC.

monitoringOct 21, 01:32 AM

We are seeing recovery across all of our products, and live data and monitor evaluations have resumed for all affected products. Most historical data in Logs has been backfilled and we have a small number of ongoing backfills in Metrics and other products. We will continue to monitor the situation overnight, and our next update will be 09:00 UTC.

identifiedOct 21, 12:25 AM

We are seeing recovery for APM. We continue to see delays in processing that impact the following products: Distribution Metrics, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

identifiedOct 20, 10:41 PM

Logs data have been backfilled, and users should no longer see gaps in their historical logs. Log Archives and Log Forwarding were paused between 15:00 and 18:30 UTC, and we are working to re-forward any logs from that time period. We continue to see delays in processing that impact the following products: Distribution Metrics, APM, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

identifiedOct 20, 10:40 PM

We are seeing recovery in Profiling. Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress. In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

identifiedOct 20, 09:47 PM

We are seeing recovery in AWS Metrics. Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress. In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, Profiling, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

identifiedOct 20, 08:14 PM

We are seeing progress in telemetry data coming from AWS into Datadog. We are starting to see our capacity requests being fulfilled more slowly than usual. App Builder and Workflow Automation are seeing recovery. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well.

identifiedOct 20, 07:01 PM

We are seeing progress in telemetry data coming from AWS into Datadog. Also, we are starting to see our capacity requests being fulfilled. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute.

identifiedOct 20, 06:04 PM

APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and for all products except RUM we expect the data will be backfilled once the service is fully operational again. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute. Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.

identifiedOct 20, 05:05 PM

identifiedOct 20, 03:18 PM

We are still seeing increased latency processing for those products and the associated monitors are delayed. We are continuing to work on bringing new capacity online and will continue to provide updates on this issue.

identifiedOct 20, 02:07 PM

We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. Monitors using the impacted data are delayed. We are working on bringing new capacity online and will provide an update once the service is fully operational again.

identifiedOct 20, 12:45 PM

We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and the data will be backfilled once the service is fully operational again.

majorresolvedOct 20, 10:14 AM — Resolved Oct 20, 11:31 AM

Multiple products impacted with data delays

3 updates

resolvedOct 20, 11:31 AM

This incident has been resolved.

monitoringOct 20, 10:53 AM

We are monitoring and seeing recovery for all products, some customers might still experience for a limited subset of data delays for logs or host vulnerability scanning specific to AWS us-east-1. We will post specific information on the affected product pages for those customers. On-Call notifications are fully operational.

identifiedOct 20, 10:14 AM

Note: this is a delayed update because this incident impaired our ability to update the status page, we posted banners earlier in the product to let customers know about the ongoing impact. We are still seeing some delays as we are fully recovering from the underlying incident: agentless vulnerability scanning for hosts in AWS us-east-1 is still delayed, On-Call notifications are not fully recovered. This incident started at 07:10 UTC on October 20. So far we have recovered fully from the impact on Synthetics, collection of data from AWS, Bits AI, Codegen, Dashboards (edition features were impaired).

criticalresolvedOct 14, 05:14 PM — Resolved Oct 14, 07:19 PM

Delayed AWS, GCP, Azure, SaaS integrations Metrics and Logs

5 updates

resolvedOct 14, 07:19 PM

This incident has been resolved.

monitoringOct 14, 06:44 PM

Data flow has been restored for new incoming data. We are currently backfilling historical data.

monitoringOct 14, 05:43 PM

A fix has been implemented, and we are monitoring the results.

identifiedOct 14, 05:29 PM

The issue has been identified and a fix is being implemented.

investigatingOct 14, 05:14 PM

We are investigating increased latency processing AWS, GCP and Azure Metrics. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

minorresolvedOct 1, 05:24 PM — Resolved Oct 1, 06:56 PM

Delayed Metrics

4 updates

resolvedOct 1, 06:56 PM

We’ve confirmed that this issue only impacts customers using the OCI integrations feature. The vast majority of customers are not impacted. Impacted customers will see an in-app banner when visiting any Datadog product page. The banner will be removed once the issue is resolved. Since the impact is localized, we are closing the status page.

investigatingOct 1, 05:58 PM

We are continuing to investigate this issue.

investigatingOct 1, 05:51 PM

We are investigating increased latency processing Processing for a Subset of Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

investigatingOct 1, 05:24 PM

We are investigating increased latency processing Metrics. As a result of this issue, some users may see delays or gaps for metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

September 2025(5 incidents)

minorresolvedSep 29, 06:17 PM — Resolved Sep 29, 07:43 PM

Host Tags, Service Checks, and Datadog Events Delayed Evaluation

4 updates

resolvedSep 29, 07:43 PM

This incident has been resolved.

monitoringSep 29, 07:36 PM

Service Check and Datadog Events monitor evaluation has recovered and data is up-to-date. Host tag updates are still recovering and stale host tags may appear in the frontend.

identifiedSep 29, 06:20 PM

The issue has been identified and a fix is being implemented.

investigatingSep 29, 06:17 PM

We’re currently investigating an issue causing delayed processing of host tag updates, Service Checks and Datadog Events, which may result in stale data appearing in the frontend. Our team is actively working to mitigate and fully resolve this. I’ll follow up as soon as the issue has been resolved. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

majorresolvedSep 18, 03:11 PM — Resolved Sep 18, 04:24 PM

[SSO] Login Errors from Google SSO

3 updates

resolvedSep 18, 04:24 PM

This incident has been resolved.

monitoringSep 18, 03:47 PM

We are seeing recovery in Google SSO logins. We are continuing to monitor for issues.

investigatingSep 18, 03:11 PM

We are investigating user login issues with the web application via Google SSO. Please note that data processing and alerts are not affected by this incident.

minorresolvedSep 17, 03:25 PM — Resolved Sep 17, 04:08 PM

Delayed Metrics

3 updates

resolvedSep 17, 04:08 PM

This incident has been resolved.

monitoringSep 17, 03:59 PM

We have deployed a fix. The impact is limited to APM Metrics, we are monitoring and we will provide another update once the issue is fully resolved

investigatingSep 17, 03:25 PM

minorresolvedSep 5, 04:11 PM — Resolved Sep 5, 04:37 PM

Delayed Monitors Notifications

4 updates

resolvedSep 5, 04:37 PM

This incident has been resolved.

monitoringSep 5, 04:31 PM

A fix has been implemented and we are monitoring the results.

identifiedSep 5, 04:15 PM

The issue has been identified and a fix is being implemented.

investigatingSep 5, 04:11 PM

We are investigating delays in Monitors Notifications for distribution metrics, which began at 3PM UTC.

minorresolvedSep 2, 02:13 PM — Resolved Sep 2, 03:16 PM

Delayed RUM data

3 updates

resolvedSep 2, 03:16 PM

This incident has been resolved.

monitoringSep 2, 02:46 PM

A fix has been implemented and we are monitoring the results.

investigatingSep 2, 02:13 PM

We are investigating increased latency processing RUM data. As a result of this issue, some users may see gaps or delays in RUM graphs as well as empty or partial query results on RUM Sessions, RUM Analytics, RUM Application, and Error Tracking pages.

August 2025(5 incidents)

minorresolvedAug 28, 07:30 PM — Resolved Aug 30, 06:58 PM

Periodic network interruption communicating with multiple Azure regions

5 updates

resolvedAug 30, 06:58 PM

Our monitoring has shown Azure’s fix to be stable since our last update. This incident has been resolved.

monitoringAug 30, 03:12 AM

Azure has implemented a permanent fix to the network issue. Both Azure and Datadog engineers are continuing to monitor overnight and will provide an update tomorrow.

monitoringAug 29, 08:47 PM

Azure has temporarily mitigated the network capacity issues which have caused episodic packet loss for customers who are hosted in Azure data centers and are using Datadog’s US1 region (accessible via https://app.datadoghq.com). Azure engineers are continuing to work to fully resolve this issue. Until they fully resolve the issue, customers with Datadog agents running in Azure data centers may see brief periods of delayed ingestion of data from agents and from Azure integrations. We don’t expect a noticeable impact thanks to agent buffering but cannot exclude the possibility of spurious alerts due to temporarily delayed data. We are continuing to monitor the situation in conjunction with Azure, and will do so throughout the weekend. We will post status page updates as soon as the situation improves, and at least every 24 hours. We thank you for your patience throughout this incident.

monitoringAug 28, 08:37 PM

The root cause of the issue has been identified and Microsoft has implemented mitigations, we are monitoring network traffic to confirm.

investigatingAug 28, 07:30 PM

Degraded network capacity in an Azure datacenter is causing network packet loss and increased latency when communicating with AWS in eastern US regions. Customers may experience communications failures trying to submit data from agents running in AWS and may experience delayed data from AWS integrations. Microsoft has identified the root cause and is working on mitigations.

noneresolvedAug 28, 04:39 AM — Resolved Aug 28, 09:05 AM

Pagerduty Monitor Notifications Delayed

3 updates

resolvedAug 28, 09:05 AM

PagerDuty notifications deliveries are back to normal.

monitoringAug 28, 08:03 AM

We are observing some recovery of notifications delays and continue to monitor the situation. Please follow our integration status page for details https://datadogintegrations.statuspage.io/

investigatingAug 28, 04:39 AM

Monitor Notifications are delayed for Pagerduty.

minorresolvedAug 27, 05:58 PM — Resolved Aug 27, 08:53 PM

Partial metrics drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter

2 updates

resolvedAug 27, 08:53 PM

We noticed partial data drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter. There is no data drop anymore, we are monitoring the situation.

monitoringAug 27, 05:58 PM

We noticed partial metrics drop from Datadog Agent in the westus2 azure region to Datadog us1 datacenter. We are actively investigating the case.

minorresolvedAug 21, 09:57 PM — Resolved Aug 21, 11:40 PM

Duplicate Logs in Aggregated Queries

4 updates

resolvedAug 21, 11:40 PM

This incident has been resolved.

monitoringAug 21, 11:35 PM

A fix has been implemented and we are monitoring the results.

identifiedAug 21, 11:29 PM

The issue has been identified and a fix is being implemented.

investigatingAug 21, 09:57 PM

We are investigating an issue processing Logs. As a result of this issue, some users may see inconsistencies in logs queries.

minorresolvedAug 5, 04:38 PM — Resolved Aug 5, 05:36 PM

Degraded Web Application Degraded

4 updates

resolvedAug 5, 05:36 PM

This incident has been resolved.

monitoringAug 5, 04:44 PM

We have identified the issue and implemented a fix, we are monitoring the recovery of the impacted products.

investigatingAug 5, 04:39 PM

We are continuing to investigate this issue.

investigatingAug 5, 04:38 PM

Due to an issue with access controls failures we're seeing downstream impact to multiple products. Our team is actively working on identifying root cause and to resolve the issue. We will be providing a more specific update shortly.

July 2025(3 incidents)

majorresolvedJul 18, 03:30 PM — Resolved Jul 18, 04:09 PM

Google SSO login errors

3 updates

resolvedJul 18, 04:09 PM

This incident has been resolved.

identifiedJul 18, 03:41 PM

Google declared an incident regarding this issue: https://www.google.com/appsstatus/dashboard/incidents/oFcAZTr4EVieF5Fr6Ee9

investigatingJul 18, 03:30 PM

We are investigating user login issues with the web application via Google SSO. Please note that data processing and alerts are not affected by this incident.

noneresolvedJul 9, 08:46 PM — Resolved Jul 9, 09:46 PM

Degraded Web Application Performance & Monitor Evaluations

4 updates

resolvedJul 9, 09:46 PM

This incident has been resolved.

monitoringJul 9, 09:26 PM

We've implemented a fix and we're seeing recovery in monitor evaluations and dashboards, we'll continue to investigate and monitor for further impact

identifiedJul 9, 09:06 PM

We've identified a possible root cause and we're actively working on mitigating the impact

investigatingJul 9, 08:46 PM

We're investigating an issue with our metrics and monitor evaluations, causing degraded web application performance and skipped monitors

majorresolvedJul 7, 02:39 PM — Resolved Jul 7, 05:07 PM

Monitors - Delayed Evaluation of logs monitors

3 updates

resolvedJul 7, 05:07 PM

This incident has been resolved.

monitoringJul 7, 03:27 PM

The team rolled out a change and has been seeing recovery. The team will continue monitoring for a period of time.

investigatingJul 7, 02:39 PM

We are investigating delays in Monitors Evaluation of logs based monitors., which began at 01:30:00 PM UTC.

June 2025(3 incidents)

majorresolvedJun 24, 09:16 PM — Resolved Jun 24, 09:48 PM

Logs Monitors - Delayed Evaluations

3 updates

resolvedJun 24, 09:48 PM

This incident has been resolved.

identifiedJun 24, 09:32 PM

The issue has been identified and a fix is being implemented.

investigatingJun 24, 09:16 PM

We are investigating delays in Logs Monitors Evaluations, which began at 8:46 PM UTC.

majorresolvedJun 9, 10:47 PM — Resolved Jun 10, 12:06 AM

Delayed processing of APM Trace Metrics

4 updates

resolvedJun 10, 12:06 AM

This incident has been resolved.

monitoringJun 9, 11:25 PM

A fix has been implemented and we are monitoring the results.

identifiedJun 9, 11:11 PM

The issue has been identified and a fix is being implemented.

investigatingJun 9, 10:47 PM

We are investigating delayed processing of APM Trace metrics starting around 21:40 UTC. Dashboards and monitors relying on these metrics are affected.

majorresolvedJun 5, 03:08 PM — Resolved Jun 5, 03:46 PM

Elevated error rates in queries across multiple products

4 updates

resolvedJun 5, 03:46 PM

This incident has been resolved.

monitoringJun 5, 03:29 PM

All impact to query systems has recovered, note that during this incident data intake and alerting have not been impacted. We are continuing to monitor the status of the fix.

identifiedJun 5, 03:18 PM

The issue has been identified and a fix is being implemented.

investigatingJun 5, 03:08 PM

We are actively investigating issues querying data affecting multiple products. As a result of this issue, there might be errors when trying to load data from queries on different pages of the web application or through the API.

May 2025(3 incidents)

minorresolvedMay 22, 05:46 PM — Resolved May 22, 07:09 PM

Monitors - Delayed Evaluation

3 updates

resolvedMay 22, 07:09 PM

This incident has been resolved.

identifiedMay 22, 06:10 PM

The issue has been identified and a fix is being implemented.

investigatingMay 22, 05:46 PM

We are investigating delays in Distribution Monitors Evaluation, which began at 5:30pm UTC. Monitors for other types of metrics are evaluating as usual.

minorresolvedMay 13, 07:06 PM — Resolved May 13, 10:28 PM

Delayed Traces and Spans in APM

5 updates

resolvedMay 13, 10:28 PM

The incident is now resolved. APM trace ingestion and all downstream systems, including monitors, have fully recovered and are up to date.

monitoringMay 13, 08:25 PM

We are monitoring a fix with to increased latency processing in APM Metrics. APM data in live view is current but distributed tracing metrics are delayed by 20 minutes. Monitors sourced from the data are impacted until the data becomes current.

investigatingMay 13, 07:33 PM

As a result of the issue we are monitoring delays in Monitors Evaluation

monitoringMay 13, 07:20 PM

A fix has been implemented and we are monitoring the results.

investigatingMay 13, 07:06 PM

We are investigating increased latency processing Traces and Spans in APM As a result of this issue, some users may see missing or delayed traces and Spans starting at 18:33 UTC.

majorresolvedMay 2, 12:56 AM — Resolved May 2, 02:13 AM

Delayed AWS Metrics and Events

4 updates

resolvedMay 2, 02:13 AM

This incident has been resolved.

identifiedMay 2, 01:52 AM

A fix has been implemented and recovery is in progress. To prevent spurious alerts, monitors on AWS Metrics and Events remain disabled until recovery is complete.

identifiedMay 2, 01:19 AM

The issue has been identified and a fix is being implemented.

investigatingMay 2, 12:56 AM

We are investigating increased latency processing AWS metrics and events. As a result of this issue, some users may see delays or gaps in graphs that contain these metrics and events. To prevent spurious alerts, we have temporarily disabled monitors based on this data.

April 2025(1 incident)

minorresolvedApr 16, 01:37 PM — Resolved Apr 16, 01:52 PM

Monitors - Delayed Evaluation

3 updates

resolvedApr 16, 01:52 PM

This incident has been resolved.

investigatingApr 16, 01:50 PM

The incident has fully recovered. The service is now fully operational.

investigatingApr 16, 01:37 PM

We are investigating delays in Monitors Evaluation, which began at 12:45 UTC.

March 2025(2 incidents)

minorresolvedMar 26, 08:03 PM — Resolved Mar 26, 08:24 PM

Delayed processing of APM Trace Metrics

5 updates

resolvedMar 26, 08:24 PM

This incident has been resolved.

monitoringMar 26, 08:16 PM

We are continuing to monitor for any further issues.

monitoringMar 26, 08:16 PM

A fix has been implemented and we are monitoring the results.

identifiedMar 26, 08:09 PM

The issue has been identified and a fix is being implemented.

investigatingMar 26, 08:03 PM

We are investigating delayed processing of APM Trace metrics starting around 07:00 UTC. Dashboards and monitors relying on these metrics are affected.

minorresolvedMar 25, 07:03 PM — Resolved Mar 26, 12:17 AM

Login Issues

4 updates

resolvedMar 26, 12:17 AM

This incident has been resolved.

identifiedMar 25, 09:34 PM

We are continuing to work on a fix for this issue.

identifiedMar 25, 07:24 PM

The issue has been identified and a fix is being implemented.

investigatingMar 25, 07:03 PM

We are investigating user login issues related to reCAPTCHA for customers using password login. If you experience an issue with reCAPTCHA, refreshing the page can often mitigate the issue. Please note that data processing and alerts are not affected by this incident.

February 2025(1 incident)

minorresolvedFeb 23, 09:30 AM — Resolved Feb 23, 03:32 PM

Delayed Processing for a Subset of Metrics

7 updates

resolvedFeb 23, 03:32 PM

This incident has been resolved.

monitoringFeb 23, 02:17 PM

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identifiedFeb 23, 01:06 PM

We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost: data is being backfilled and will be available once the service is operational again.

identifiedFeb 23, 11:24 AM

identifiedFeb 23, 10:33 AM

We have identified the underlying issue and continue to work on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.

identifiedFeb 23, 10:02 AM

We have identified the underlying issue and are working on a fix. It is important to note that no data has been lost, and it will be backfilled and available once the service is operational again.

investigatingFeb 23, 09:30 AM

We are investigating increased latency processing Trace Metrics. As a result of this issue, some users may see delays or gaps for a subset of their metrics on graphs and statistics on Service Catalog.

January 2025(3 incidents)

minorresolvedJan 31, 04:03 PM — Resolved Jan 31, 07:03 PM

Degraded Web Application Performance

5 updates

resolvedJan 31, 07:03 PM

This incident has been resolved.

monitoringJan 31, 06:22 PM

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identifiedJan 31, 04:49 PM

We have identified the underlying issue and are continuing to work on a fix. Degraded web application performance is primarily observed in customers with low network bandwidth.

identifiedJan 31, 04:29 PM

We have identified the underlying issue and are working on a fix.

investigatingJan 31, 04:03 PM

We are investigating degraded performance with the web application.

minorresolvedJan 17, 01:42 PM — Resolved Jan 17, 04:50 PM

Increased delay processing events

6 updates

resolvedJan 17, 04:50 PM

This incident has been resolved.

monitoringJan 17, 03:34 PM

We are continue to monitor the progress of processing the backlog in Events. The majority of the backlog has been processed. Event Monitor evaluation remains delayed while we finish processing the backlog.

identifiedJan 17, 02:31 PM

We've implemented a fix, and are currently working through the backlog of delayed Events. Event Monitor evaluation remains delayed while we work through the backlog. All other monitor types have recovered and are currently evaluating.

identifiedJan 17, 01:55 PM

We have identified the issue causing delayed ingestion of Events. Alerting evaluation continues to be delayed for Event Monitors, Process Monitors, and Cloud Network monitors. All other monitor types have recovered and are currently evaluating.

investigatingJan 17, 01:46 PM

We are continuing to investigate this issue.

investigatingJan 17, 01:42 PM

We are investigating increased latency processing Events. As a result of this issue, some users may see delays in the event stream or for event queries on dashboards, and event alert evaluation is delayed. This issue also caused a delay in the processing of alerts across other products. We've implemented a fix for this, and are monitoring the recovery of the alert evaluation pipeline. As a result, a subset alerts may be delayed while the system recovers.

minorresolvedJan 3, 05:13 AM — Resolved Jan 3, 05:34 AM

APM connections retrying

4 updates

resolvedJan 3, 05:34 AM

This incident has been resolved.

monitoringJan 3, 05:28 AM

We have mitigated the cause of transient agent submission errors for APM and customers should no longer observe these errors. The Datadog Agent automatically retries these errors and succeeded on retry; this incident did not result in any data loss

identifiedJan 3, 05:15 AM

The issue has been identified and a fix is being implemented.

investigatingJan 3, 05:13 AM

Some US1 customers experiencing degraded performance for APM. Customers may see transient errors, but these should resolve with an automatic retry from the Datadog agent.

December 2024(1 incident)

majorresolvedDec 4, 06:48 PM — Resolved Dec 4, 09:50 PM

Delayed APM Distribution Metrics, Data Streams Monitoring Metrics & Monitor Notifications

6 updates

resolvedDec 4, 09:50 PM

This incident has been resolved.

monitoringDec 4, 07:43 PM

A fix has been implemented and we are monitoring the results.

identifiedDec 4, 07:05 PM

Data Streams Monitoring metrics and associated monitor notifications based on these metrics have recovered.

identifiedDec 4, 06:53 PM

We are continuing to work on a fix for this issue.

identifiedDec 4, 06:49 PM

The issue has been identified and a fix is being implemented.

investigatingDec 4, 06:48 PM

We are investigating increased latency in processing APM Distribution Metrics and Data Streams Monitoring Metrics as well as monitors notifications based on these metrics, which began at 17h47 UTC. As a result of this issue, some users may see delays or gaps for these metrics on graphs, including APM pages as well as delayed monitor notifications.

November 2024(4 incidents)

majorresolvedNov 26, 07:12 PM — Resolved Nov 26, 08:54 PM

Delayed APM data ingestion

3 updates

resolvedNov 26, 08:54 PM

This incident has been resolved.

monitoringNov 26, 07:45 PM

A fix has been implemented and systems are recovering.

investigatingNov 26, 07:12 PM

We are investigating increased ingestion latency of APM data.

minorresolvedNov 20, 04:16 PM — Resolved Nov 20, 06:09 PM

Monitors - Delayed Evaluation for Distribution Metric Monitors

6 updates

resolvedNov 20, 06:09 PM

This incident has been resolved.

monitoringNov 20, 06:02 PM

We have rolled out out a fix and all distribution monitors are up to date. We are continuing to monitor the customer experience and expect to resolve this incident in the next 30 minutes.

identifiedNov 20, 05:42 PM

We are in the process of rolling out a fix that will bring all distribution monitors up to date. We will update again when the issue is resolved.

identifiedNov 20, 05:07 PM

The root cause has been identified. We are working on a fix so that distribution metric monitor evaluations are up to date.

investigatingNov 20, 04:46 PM

We are investigating delays in monitor evaluations for monitors based on distribution metrics, starting at 15h35UTC. This is causing a delay in notifications.

investigatingNov 20, 04:16 PM

We are investigating delays in Distribution Metric Monitors Evaluation, which began at 15h35UTC.

minorresolvedNov 20, 04:36 PM — Resolved Nov 20, 05:06 PM

Monitors - Delayed Evaluation

3 updates

resolvedNov 20, 05:06 PM

This incident has been resolved.

monitoringNov 20, 04:53 PM

A fix has been implemented and we are monitoring the results.

investigatingNov 20, 04:36 PM

We are investigating delays in Events-based Monitor Evaluation, which began at 16:00 UTC.

majorresolvedNov 15, 01:33 PM — Resolved Nov 15, 02:51 PM

Delayed Distribution Metrics

4 updates

resolvedNov 15, 02:51 PM

This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.

monitoringNov 15, 02:05 PM

A fix has been implemented and we are monitoring the results.

investigatingNov 15, 01:33 PM

We are continuing to investigate this issue.

investigatingNov 15, 01:33 PM

We are investigating increased latency processing Distribution Metrics. As a result, some users may see delays or gaps for distribution metrics on graphs, including APM pages. Monitors based on this data may also be delayed. We have identified the problem and are actively working to resolve the issue.

October 2024(2 incidents)

minorresolvedOct 17, 06:16 PM — Resolved Oct 17, 07:28 PM

Delayed distribution metrics & monitor notifications

4 updates

resolvedOct 17, 07:28 PM

This incident has been resolved.

monitoringOct 17, 07:03 PM

We have deployed a fix and we are monitoring the results. We will provide another update once the issue is fully resolved.

identifiedOct 17, 06:36 PM

We have identified the underlying issue and are working on a fix.

investigatingOct 17, 06:16 PM

We are investigating delays in distribution metrics, and on monitors notifications for monitors based on these metrics, which began at 17:40 UTC.

majorresolvedOct 11, 07:52 PM — Resolved Oct 11, 10:20 PM

Delayed Distribution Metrics

5 updates

resolvedOct 11, 10:20 PM

This incident has been resolved. All distribution metrics are being processed and monitors are no longer disabled for distribution metrics.

monitoringOct 11, 09:13 PM

A fix has been implemented and we are monitoring the results.

identifiedOct 11, 08:35 PM

We are continuing to work on a fix for this issue.

identifiedOct 11, 07:57 PM

The issue has been identified and remediation steps are underway.

investigatingOct 11, 07:52 PM

We are investigating increased latency processing Distribution Metrics. As a result of this issue, some users may see delays or gaps for distribution metrics on graphs. To prevent spurious alerts, we have temporarily disabled monitors based on distribution metrics.