High API failure rate due to load balancer failure
Incident Report for LiveKit
Postmortem

At around 16:35 UTC Jan 6, our cloud provider’s API service went down in US East, while we had high traffic volume in the region. This caused issues with autoscaling, as well as with the cloud provider’s Kubernetes CNI plugin. The outcome was that new API requests to LiveKit Cloud would randomly fail, while already established WebSocket/Signal connections remained unaffected. Cloud Dashboard pages could also fail to load.

Customer traffic was diverted to the next closest regions. Once our cloud provider resolved their API issues, we were able to restore functionality of LiveKit Cloud in the US East region. Service was fully restored by 20:35 UTC Jan 6.

Posted Jan 06, 2023 - 12:52 PST

Resolved
This incident has been resolved.
Posted Jan 06, 2023 - 12:34 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 06, 2023 - 12:29 PST
Update
We are continuing to work on a fix for this issue.
Posted Jan 06, 2023 - 09:30 PST
Identified
We've identified an issue with our cloud provider that is causing random failures with API requests in our US East region. We are continuing to monitor the situation.
Posted Jan 06, 2023 - 09:00 PST
This incident affected: Cloud Dashboard (cloud.livekit.io).