Azure Virginia connectivity issues
Incident Report for ESS (Public)
Postmortem
Incident Date 2021-05-12
Reference Numbers CLOUD:81303
Incident Start Date (UTC) 2021-05-12T17:29:26
Incident End Date (UTC) 2021-05-12T20:00:45
Incident Duration 2:31:18
Impact Start Date (UTC) 2021-05-12T16:58:00
Impact End Date (UTC) 2021-05-12T18:13:00
Impact Duration 1 hour 15 minutes

Summary

Impact

Loss of access to virtual machines hosting customer deployment instances in Azure Virginia (azure-eastus2) zone 3 due to provider misconfiguration

Root Cause(s)

An incorrect configuration was applied to a storage scale unit by Microsoft resulting in impact on downstream services that relied on the storage resources provided by this scale unit, including several virtual machines we have provisioned to host customer deployments (allocators)

Resolution Action Items

Microsoft rolled back the storage configuration update, restoring access to the affected services, including the virtual machines included in this incident.

Timeline

UTC Date & Time Event
2021-05-12T16:58:00 Provider updates configuration of backend storage scale unit
2021-05-12T17:09:00 Elastic Cloud engineers receive first alert of issues with Azure Virginia zone 3
2021-05-12T17:29:26 Incident created for loss of connectivity to virtual machines in Azure Virginia zone 3
2021-05-12T17:43:00 Provider completes rollback of incorrect configuration of backend storage scale unit
2021-05-12T17:52:00 Incident posted on Elastic Cloud Status
2021-05-12T18:00:00 Received notification from Microsoft of a provider-related issue within this region related to storage scale unit change
2021-05-12T18:13:00 Elastic Cloud engineers confirmed access restored to affected virtual machines
2021-05-12T20:06:00 Incident marked as resolved on Elastic Cloud Status
Posted May 19, 2021 - 01:26 UTC

Resolved
This incident has been resolved and a full root cause will be posted within 3 business days.
Posted May 12, 2021 - 20:06 UTC
Monitoring
Connectivity with Azure Virginia (azure-eastus2) zone 3 has been restored, and there should no longer be any customer impact. We are working with our provider to determine the root cause for this outage. A full root cause will be posted within 3 business days.
Posted May 12, 2021 - 18:25 UTC
Identified
We have been notified of a provider-related infrastructure issue in this region and we are working with Azure to remediate this situation for affected customers.
Posted May 12, 2021 - 18:04 UTC
Investigating
We are currently investigating connectivity issue with Azure Virginia (azure-eastus2) Zone 3. This may affect connectivity to existing deployments as well as the ability to create new deployments in this region. We will post additional updates within 15 minutes.
Posted May 12, 2021 - 17:52 UTC
This incident affected: Azure Virginia (azure-eastus) (Watcher email actions, Deployment logging: Azure azure-eastus, App Search connectivity: Azure azure-eastus, Kibana connectivity: Azure azure-eastus, Enterprise Search connectivity: Azure azure-eastus, Deployment metrics: Azure azure-eastus, APM connectivity: Azure azure-eastus, Deployment snapshots: Azure azure-eastus, Elasticsearch connectivity: Azure azure-eastus, Deployment orchestration (Create/Edit/Restart/Delete): Azure azure-eastus, Deployment hosts: Azure azure-eastus).