Incident Summary: On January 2, 2025, at 8:50 AM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay.
Impact: Users experienced delays in sequential unlocking operations for approximately 4 hours and 25 minutes. No data loss or corruption was observed, but service performance was degraded during this period.
Timeline:
Resolution: The unhealthy node was identified and restored to a healthy state, allowing the system to process the backlog of delayed queries. Once the backlog was cleared, sequential unlocking operations returned to normal functionality.
Next Steps: To prevent recurrence, the following actions will be taken:
We sincerely apologize for the inconvenience caused and appreciate your understanding as we work to improve the resilience of our systems.