Cluster connectivity issues in us-west-2

Incident Report for Elastic Cloud (Public)

Resolved

This incident has been resolved.

Posted Feb 27, 2019 - 10:33 UTC

Update

We are continuing to monitor for any further issues.

Posted Feb 27, 2019 - 09:01 UTC

Update

100% of affected clusters have been restored, and we are closely monitoring the stability of the region.

Posted Feb 27, 2019 - 09:00 UTC

Update

99% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.

Posted Feb 27, 2019 - 08:10 UTC

Update

98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.

Posted Feb 27, 2019 - 07:43 UTC

Update

Update - 98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.

Posted Feb 27, 2019 - 07:14 UTC

Update

98% of clusters have been restored, we're finalising the last clusters and will have them green as soon as possible.

Posted Feb 27, 2019 - 06:41 UTC

Update

98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.

Posted Feb 27, 2019 - 06:07 UTC

Update

96% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.

Posted Feb 27, 2019 - 05:35 UTC

Update

We've restored 92% of clusters affected by the original issue.

Posted Feb 27, 2019 - 04:57 UTC

Update

We've restored 84% of clusters affected by the original issue. We're continuing to restore unhealthy nodes in the remaining clusters to bring everyone back to green as soon as possible.

Posted Feb 27, 2019 - 04:12 UTC

Update

We're continuing to restore unhealthy nodes in clusters affected by the original issue. 57% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as possible.

Posted Feb 27, 2019 - 03:34 UTC

Monitoring

We've restored access to the customer control plane and will continue to monitor its availability. Customers will now be able to create new clusters or update any existing green clusters within us-west-2. We're proceeding with restoring unhealthy nodes in clusters affected by the original issues and will provide updates as we progress.

Posted Feb 27, 2019 - 02:57 UTC

Update

We’re still working on restoring access to the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. We’ve taken steps to minimise any further impact on cluster availability. Any green clusters will remain green, but we recommend no changes be made. We’re working on restoring this functionality. Rest assured we have all hands on deck. We’re unable to resolve any issues with previously impacted clusters until customer control plane access is restored.

Posted Feb 27, 2019 - 02:19 UTC

Update

Posted Feb 27, 2019 - 01:49 UTC

Update

We’ve encountered a major issue in restoring stability in the platform which has resulted in an outage in the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. Any green clusters will remain green. We’re working on restoring this functionality. Rest assured we have all hands on deck. We're unable to resolve any issues with previously impacted clusters until customer control plane access is restored.

Posted Feb 27, 2019 - 01:14 UTC

Update

Posted Feb 27, 2019 - 01:11 UTC

Update

Our fixes are progressing and we’re continuing to work on this incident. 30% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can. We've encountered a hurdle that has slowed progress, but we are actively working to remove this. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.

Posted Feb 27, 2019 - 00:46 UTC

Update

Our fixes are progressing and we’re continuing to work on this incident. 24% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.

Posted Feb 27, 2019 - 00:14 UTC

Update

Posted Feb 26, 2019 - 23:40 UTC

Update

Our work is still progressing, and several fixes are in flight. Some customers should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can.

Posted Feb 26, 2019 - 23:07 UTC

Update

Our work is progressing, and several fixes are in flight. Some customers should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can.

Posted Feb 26, 2019 - 22:40 UTC

Update

We’re still working hard on remediation for the deployments affected by this issue, and we appreciate your patience

Posted Feb 26, 2019 - 22:02 UTC

Update

We are continuing remediation efforts for the deployments that were affected by this issue.

Posted Feb 26, 2019 - 21:33 UTC

Update

This issue only affects a single availability zone in US-West-2. The customer impact is limited to single zone cluster setups that happened to be in the affected AZ. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.

Posted Feb 26, 2019 - 21:05 UTC

Identified

We are working to remediate a problem that occurred in us-west-2. A misconfigured autoscaling group has resulted in the loss of a number of hosts. We will post more information as we work through the issue.

Posted Feb 26, 2019 - 20:37 UTC

This incident affected: AWS Oregon (us-west-2) (Elasticsearch connectivity: AWS us-west-2).