Logging and metrics delay in us-east-1 region

Incident Report for ESS (Public)

Resolved

This incident has been resolved.

Posted Mar 06, 2021 - 03:59 UTC

Update

Incident has been resolved. All logs queues are back to normal operating limits & log flow has been consistent for the past hour.

Posted Mar 06, 2021 - 03:58 UTC

Monitoring

Log flow has been consistently within expected limits for the past 30 minutes. We'll continue to monitor this incident and provide a further update in an hour.

Posted Mar 06, 2021 - 03:20 UTC

Update

Logging queues are continuing to modulate. We're assessing further tactics to solve this issue & expect to be able to provide an update in the next hour.

Posted Mar 06, 2021 - 02:12 UTC

Update

We are continuing to monitor the logging queue. We'll provide a further update in an hour

Posted Mar 06, 2021 - 00:48 UTC

Identified

We identified an unhealthy cloud instance causing abnormal behavior with one of the nodes in the logging deployment. The instance has been isolated and the logging pipeline is catching up on the queues. We continue to monitor the status of the incident.

Posted Mar 05, 2021 - 23:34 UTC

Update

We're continuing to investigate the delay. Please note there is no interruption to deployments or the ability to manage those. We'll update again in an hour

Posted Mar 05, 2021 - 23:14 UTC

Investigating

We are investigating a delay in indexing logs and metrics in us-east-1 region. We have identified that the delay has been caused by a ~10x spike in logging volume and working on identifying the source of the increased activity.

Posted Mar 05, 2021 - 21:49 UTC

This incident affected: AWS N. Virginia (us-east-1) (Deployment logging: AWS us-east-1).