Our cloud provider’s managed Kubernetes control plane experienced an issue that caused connectivity to cluster nodes to be disrupted, dropping some of the workloads in the cluster. Due to a bug in our auto-draining logic, detection of the issue was delayed. We have since fixed the alerting to ensure faster detection and response if a similar situation occur should occur in the future.
Posted Jan 27, 2026 - 13:35 PST
Resolved
We brought the London cluster back to full health at 21:40 UTC. This incident is now resolved. We will publish a postmortem outlining the root cause and the steps we will take to ameliorate this issue in the future as soon as possible.
Posted Jan 23, 2026 - 14:24 PST
Monitoring
All non-pinned traffic has been healthy since we began routing traffic through the nearest healthy nodes at 14:50 UTC. Our team is still working to bring the London cluster back to full health.
Posted Jan 23, 2026 - 10:38 PST
Update
We are continuing to investigate the root cause of the performance degradation of the London cluster. We will continue to share updates as we learn more.
Posted Jan 23, 2026 - 08:21 PST
Identified
We are routing traffic away from the London cluster to mitigate impact.
Posted Jan 23, 2026 - 06:50 PST
Investigating
We are currently investigating reports of elevated API latency in the UK region.
Posted Jan 23, 2026 - 06:39 PST
This incident affected: Regional Real Time Communication (United Kingdom - Real Time Communication).