Users unable to recieve media
Incident Report for LiveKit
Postmortem

This incident was due to inadequate backwards compatibility in RoomService code. We had deployed a change that fixed a few bugs with RoomService. The fixes relied on changes in both RoomService API as well as the media instances. When the change was deployed, it was deployed to RoomService instances immediately; but the change to media instances were deployed in canary mode in a single region.

When CreateRoom was called in that region, it caused the media node handling the request to panic.

LiveKit clients are built to handle resume & reconnection. So when the media node crashed, participants are automatically migrated to a new instance, causing a moment of pause in streams in that region. The disruption should have been short and recovered automatically.

To mitigate future disruption to service, we’ll ensure that service and media changes are always backwards compatible for at least a version.

Posted Nov 19, 2022 - 00:01 PST

Resolved
This incident has been resolved.
Posted Nov 16, 2022 - 12:11 PST
Update
We have identified the issue and deployed a fix. We will continue to monitor for issues.
Posted Nov 16, 2022 - 11:33 PST
Investigating
We are currently investigating the issue.
Posted Nov 16, 2022 - 11:07 PST
This incident affected: Global Real Time Communication.