Inter-region disruption
Incident Report for LiveKit
Postmortem

May 11th, 11:28 UTC - Hosting provider notified of connectivity issues

We were notified of a connectivity issue affecting multiple data centers, and were advised that there may be intermittent connection issues. Our monitoring did not pick up any noticeable disruptions at the moment. All systems appeared to function normally.

May 11th, 13:33 UTC - Hosting provider resolved connectivity issues

Our hosting provider implemented a fix for their reported connectivity issues. During this period, we still did not observe any issues on our side, so we did not take any steps then.

May 11th, 14:01 UTC - Inter-datacenter connectivity severed

Connections between data centers started failing and our team was paged. Our team was paged about the issue and decided to reroute traffic to other data centers.

May 11th, 14:20 UTC - Reroute completed

All user connections have been moved to other data centers. Inter-region connectivity has been restored. This concludes the incident.

We expect our data centers to be isolated in order to deliver 99.99% uptime. Incidents like this from the provider weakens our confidence regarding their data center and operational isolation. Because of that, we’ve decided to decided to remove them from our global fleet. We’ll be replacing them with Google Cloud. This transition will be completed by May 19th.

Posted May 17, 2023 - 23:53 PDT

Resolved
Our hosting provider has temporarily lost connectivity between their inter-datacenter links, causing users connected to only be able to retrieve media from others that are connected to the same datacenter. This disruption lasted about ~20 mins. We have rerouted traffic to alternative data centers.
Posted May 11, 2023 - 07:00 PDT