Delay in Sequential Unlocking Operation

Incident Report for MindTickle, Inc.

Postmortem

Incident Summary: On January 2, 2025, at 8:50 AM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay.

Impact: Users experienced delays in sequential unlocking operations for approximately 4 hours and 25 minutes. No data loss or corruption was observed, but service performance was degraded during this period.

Timeline:

  • [02-Jan-2025, 08:50 AM PST]: Issue detected with one of the database cluster nodes becoming unhealthy.
  • [02-Jan-2025, 09:20 AM PST]: Sequential unlocking requests began experiencing delays.
  • [02-Jan-2025, 01:15 PM PST]: The unhealthy node issue was resolved, and the backlog of queries began processing.
  • [02-Jan-2025, 01:15 PM PST]: Backlog fully cleared, and normal operations resumed.

Resolution: The unhealthy node was identified and restored to a healthy state, allowing the system to process the backlog of delayed queries. Once the backlog was cleared, sequential unlocking operations returned to normal functionality.

Next Steps: To prevent recurrence, the following actions will be taken:

  1. Implement additional monitoring and alerting mechanisms to detect similar issues early.
  2. Review and optimize our handling of queued requests to minimize delays during high-load scenarios.

We sincerely apologize for the inconvenience caused and appreciate your understanding as we work to improve the resilience of our systems.

Posted Jan 15, 2025 - 05:07 PST

Resolved

On January 2, 2025, between 8:50 AM PST to 01:15 PM PST, one of the nodes in our database cluster became unhealthy, causing an increased load on the remaining nodes. This led to sequential unlocking requests getting stuck in a queue and being processed with a delay.
Posted Jan 02, 2025 - 08:50 PST