Post Mortem of Outage – Monday 07/02/2018

Description of Outage:

At 6:30 AM ET on 7/2/2018 we identified through our morning testing that SureDesk users were unable to login.  Anyone attempting login would receive a message that apps and desktops were unavailable. 


Outage Review:

We determined that the Delivery Controller was not responding and thus preventing users from connecting to their desktop environments.  Our normal restore efforts for this kind of report did not immediately resolve the issue as some services still remained locked up.  We began a restore process of all the SQL Databases running on the login server.  We determined the database that had the corruption and restored.  All services were restored at 9:30 AM and most customers notified.  We then did additional testing for customers with custom servers to ensure all logins all restored.  All customers were notified by 10:15 AM.   Total system outage was 3 hours. 



From the review of the incident we identified an improved process to compact the databases every 3 months to enable restore to be much faster if a similar database corruption were to occur again in the Delivery Controller.   We implememted and tested this process and the restore should be less than 10 minutes to address a similar issue in the future should it occur.

blog comments powered by Disqus