On September 19, 2025, during a periodic platform upgrade, the Mindtickle platform experienced an outage lasting approximately 1 hour and 43 minutes.
The disruption was caused by a configuration issue on upgraded servers, which led to resource constraints. As a result, application services could not run as expected, causing downtime for a set of customers (in the US region).
Our engineering team identified the issue, corrected the configuration, and rotated the affected servers. Services were fully restored and stabilized thereafter.
The outage was caused by misconfigured disk sizes in newly upgraded servers. This resulted in resource shortages that prevented application services from running.
This misconfiguration was not detected during pre-upgrade validation because upgrade scripts did not fully account for updated server requirements.
To prevent recurrence, we are implementing the following measures:
We sincerely apologize for the disruption this outage caused. We are committed to learning from this incident and strengthening our upgrade and validation processes to ensure greater reliability and resilience of the Mindtickle platform.