Analysis: After further investigation by GoFormz engineers, it was determined that this incident was triggered by an unusually large mobile Sync process initiated by a GoFormz customer that caused GoFormz’s Permission Service to become non-responsive. As designed, the Permission Service started an auto-heal process. However during this auto-heal process the Sync Service continued to receive 500 error codes which degraded performance until the Sync Service was then manually restarted.
Mitigation: In order to mitigate potential recurrence of this specific type of incident, GoFormz’s engineering has scaled up our Permission Service to better handle such unusual usage spikes. We are also reconfiguring the Permission and Sync Service auto-heal policies to appropriately handle 500 error codes from different platform services.