Elevated API latency affecting Real Time Communication services in London Region

Incident Report for LiveKit

Postmortem

Our cloud provider’s managed Kubernetes control plane experienced an issue that caused connectivity to cluster nodes to be disrupted, dropping some of the workloads in the cluster. Due to a bug in our auto-draining logic, detection of the issue was delayed. We have since fixed the alerting to ensure faster detection and response if a similar situation occur should occur in the future.

Posted Jan 27, 2026 - 13:35 PST

Resolved

We brought the London cluster back to full health at 21:40 UTC. This incident is now resolved. We will publish a postmortem outlining the root cause and the steps we will take to ameliorate this issue in the future as soon as possible.

Posted Jan 23, 2026 - 14:24 PST

Monitoring

All non-pinned traffic has been healthy since we began routing traffic through the nearest healthy nodes at 14:50 UTC. Our team is still working to bring the London cluster back to full health.

Posted Jan 23, 2026 - 10:38 PST

Update

We are continuing to investigate the root cause of the performance degradation of the London cluster. We will continue to share updates as we learn more.

Posted Jan 23, 2026 - 08:21 PST

Identified

We are routing traffic away from the London cluster to mitigate impact.

Posted Jan 23, 2026 - 06:50 PST

Investigating

We are currently investigating reports of elevated API latency in the UK region.

Posted Jan 23, 2026 - 06:39 PST

This incident affected: Regional Real Time Communication (United Kingdom - Real Time Communication).