Intermittent issues loading the Mindtickle platform
Incident Report for MindTickle, Inc.
Postmortem

We have performed the root cause analysis of the incident as part of our incident management postmortem process.

Impact:

  • Users could not access coaching and mission modules from 11:34 PM till midnight PT for approximately 36 minutes.
  • Some users intermittently received errors while loading the pages on both learning and admin sites.

Root Cause Analysis:

  • There was a change in our couchbase database cluster, where we added a new bucket to handle application caching.
  • After adding this bucket, one of the nodes of the couchbase database cluster became unavailable and immediately came up. This triggered an unintended node rebalancing activity which passed on unparsable events to the coaching application and caused temporary unavailability of coaching and mission modules.
  • Further, the newly added node could not connect with the platform applications through the existing cluster authentication. The load-balanced requests to this node failed to process, and users received errors on the admin and learning site. As per our initial analysis, this seems to be a corner case bug in the couchbase.

Actions Taken:

  • Once the node rebalancing activity was completed, the coaching application became available, and the team re-processed the required events to ensure data consistency.
  • The newly created node was moved from the original cluster to a different cluster. This resolved the authentication issue and user requests started processing normally.
  • As a preventive action, we have implemented necessary validations in the application to ensure such un-parsable events do not cause issues in the future.
  • We are further exploring the reasons behind the node authentication issue and working with the database support team to identify additional action items.
Posted Aug 13, 2021 - 09:18 PDT

Resolved
This incident has been resolved.
Posted Aug 12, 2021 - 00:14 PDT
Monitoring
Coaching and mission modules are operating normally now. We have implemented fixes to resolve intermittent issues affecting the Mindtickle platform.
Posted Aug 12, 2021 - 00:01 PDT
Update
We are continuing to investigate this issue.
Posted Aug 11, 2021 - 23:46 PDT
Investigating
Several users have reported intermittent issues loading the Mindtickle platform. Coaching and mission modules are also unavailable. We are currently investigating this issue.
Posted Aug 11, 2021 - 23:45 PDT
This incident affected: Practice and Execution (Mission, Coaching Sessions), Operational (Login), and Interface (Admin Site, Learning Site, Mobile App).