Emergency maintenance update

Emergency maintenance update

Published

The emergency maintenance operations which CSIT had schedule for this evening did not go as expected and, unfortunately, resulted in a significant interruption in service.  Beginning shortly after 7:00pm, as we started the operation a secondary UPS which we had anticipated would handle the electrical load as we worked on the primary UPS was not up to the task, as a consequence depriving our main VMware cluster and storage arrays of power, effectively shutting down the cluster.  The impact of the interruption was evident both immediately as some servers became unresponsive to clients and after some minutes as, for example, network devices attempted to revalidate their connections.

The service interruption lasted for approximately 90 minutes as we worked through the process of bringing storage, servers and applications back online.  At the moment, we believe that all services have been restored, but we will be monitoring closely and invite reports of any issues you might experience, whether clearly related to this evening incident or not.

We apologize for the inconvenience this interruption has caused and we appreciate your patience as we work to assure that all service has been restored.  So far as the original issue is concerned, CSIT will announce the schedule for any further operations.