Incident Summary
On August 29, 2025, the Mindtickle platform experienced a temporary disruption where some users were unable to log in or access programs. The issue was identified and resolved within 28 minutes, restoring the platform to normal operation.
- Start time: August 29, 2025, 12:49 PM PT
- End time: August 29, 2025, 01:17 PM PT
Impact Area
The following functionality was impacted during the incident:
- User logins
- Access to programs/assets (assigned series, modules, and assets were impacted)
Incident Timeline
- August 29, 2025, 12:49 PM PT: Users began experiencing login and program access errors.
- August 29, 2025, 12:55 PM PT: The Engineering team detected elevated error rates and initiated an investigation.
- August 29, 2025, 01:17 PM PT: Corrective actions applied; services restored to a stable state.
Root Cause Analysis
The disruption was caused by system resource exhaustion in one database cluster, which led to request timeouts and high error rates for affected services. Once identified, the engineering team stabilized services by resetting resource pools and prioritizing critical traffic.
Next Steps and Preventive Actions
- System Safeguards: Implement circuit breakers to isolate and recover from failures faster.
- Resiliency Improvements: Maintain priority channels for critical operations to reduce customer impact in similar scenarios.