Cluster connectivity issues in us-west-2
Incident Report for Elastic Cloud
Resolved
This incident has been resolved.
Posted 4 months ago. Feb 27, 2019 - 10:33 UTC
Update
We are continuing to monitor for any further issues.
Posted 4 months ago. Feb 27, 2019 - 09:01 UTC
Update
100% of affected clusters have been restored, and we are closely monitoring the stability of the region.
Posted 4 months ago. Feb 27, 2019 - 09:00 UTC
Update
99% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 08:10 UTC
Update
98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 07:43 UTC
Update
Update - 98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 07:14 UTC
Update
98% of clusters have been restored, we're finalising the last clusters and will have them green as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 06:41 UTC
Update
98% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 06:07 UTC
Update
96% of affected clusters have been restored. We're working on the remaining clusters and will have them restored as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 05:35 UTC
Update
We've restored 92% of clusters affected by the original issue.
Posted 4 months ago. Feb 27, 2019 - 04:57 UTC
Update
We've restored 84% of clusters affected by the original issue. We're continuing to restore unhealthy nodes in the remaining clusters to bring everyone back to green as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 04:12 UTC
Update
We're continuing to restore unhealthy nodes in clusters affected by the original issue. 57% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as possible.
Posted 4 months ago. Feb 27, 2019 - 03:34 UTC
Monitoring
We've restored access to the customer control plane and will continue to monitor its availability. Customers will now be able to create new clusters or update any existing green clusters within us-west-2. We're proceeding with restoring unhealthy nodes in clusters affected by the original issues and will provide updates as we progress.
Posted 4 months ago. Feb 27, 2019 - 02:57 UTC
Update
We’re still working on restoring access to the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. We’ve taken steps to minimise any further impact on cluster availability. Any green clusters will remain green, but we recommend no changes be made. We’re working on restoring this functionality. Rest assured we have all hands on deck. We’re unable to resolve any issues with previously impacted clusters until customer control plane access is restored.
Posted 4 months ago. Feb 27, 2019 - 02:19 UTC
Update
We’re still working on restoring access to the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. We’ve taken steps to minimise any further impact on cluster availability. Any green clusters will remain green, but we recommend no changes be made. We’re working on restoring this functionality. Rest assured we have all hands on deck. We’re unable to resolve any issues with previously impacted clusters until customer control plane access is restored.
Posted 4 months ago. Feb 27, 2019 - 01:49 UTC
Update
We’ve encountered a major issue in restoring stability in the platform which has resulted in an outage in the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. Any green clusters will remain green. We’re working on restoring this functionality. Rest assured we have all hands on deck. We're unable to resolve any issues with previously impacted clusters until customer control plane access is restored.
Posted 4 months ago. Feb 27, 2019 - 01:14 UTC
Update
We’ve encountered a major issue in restoring stability in the platform which has resulted in an outage in the customer control plane. Customers will be unable to create/update clusters within us-west-2, but access to clusters via any clients or apps is unaffected. Any green clusters will remain green. We’re working on restoring this functionality. Rest assured we have all hands on deck. We're unable to resolve any issues with previously impacted clusters until customer control plane access is restored.
Posted 4 months ago. Feb 27, 2019 - 01:11 UTC
Update
Our fixes are progressing and we’re continuing to work on this incident. 30% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can. We've encountered a hurdle that has slowed progress, but we are actively working to remove this. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.
Posted 4 months ago. Feb 27, 2019 - 00:46 UTC
Update
Our fixes are progressing and we’re continuing to work on this incident. 24% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.
Posted 4 months ago. Feb 27, 2019 - 00:14 UTC
Update
Our fixes are progressing and we’re continuing to work on this incident. 24% of affected clusters should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can.
Posted 4 months ago. Feb 26, 2019 - 23:40 UTC
Update
Our work is still progressing, and several fixes are in flight. Some customers should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can.
Posted 4 months ago. Feb 26, 2019 - 23:07 UTC
Update
Our work is progressing, and several fixes are in flight. Some customers should no longer be experiencing issues, and we continue our efforts to bring everyone back to green as soon as we can.
Posted 4 months ago. Feb 26, 2019 - 22:40 UTC
Update
We’re still working hard on remediation for the deployments affected by this issue, and we appreciate your patience
Posted 4 months ago. Feb 26, 2019 - 22:02 UTC
Update
We are continuing remediation efforts for the deployments that were affected by this issue.
Posted 4 months ago. Feb 26, 2019 - 21:33 UTC
Update
This issue only affects a single availability zone in US-West-2. The customer impact is limited to single zone cluster setups that happened to be in the affected AZ. Clusters that have HA setups across multiple AZs (our recommended configuration) should not see cluster availability issues, and we will recover any nodes that are in the affected AZ automatically for you.
Posted 4 months ago. Feb 26, 2019 - 21:05 UTC
Identified
We are working to remediate a problem that occurred in us-west-2. A misconfigured autoscaling group has resulted in the loss of a number of hosts. We will post more information as we work through the issue.
Posted 4 months ago. Feb 26, 2019 - 20:37 UTC
This incident affected: Cluster Management (Cluster Management Console Service, Cluster Management API) and AWS Oregon (us-west-2) (Cluster Connectivity: AWS us-west-2).