Approximately 8:30pm PT we began maintenance to remove unused data centers from our production configuration. Once the changes had been applied at around 9:30pm PT we started to see issues with cross-region communication involving our global messaging bus.
After investigating we found that errors in the above config changes were causing issues in connectivity between the messaging bus. After reverting the changes we were able to restore connectivity between regions at approximately 11:30pm PT.
We've added a few processes to ensure we don't run into similar issues in the future. First, we're adding additional monitoring/paging to ensure we can identify and catch these issues quickly. Second, we've also added additional process around config changes for our global messaging bus and will ensure that configs applied are correct moving forward.