Sometimes you’re so proud of the team you have to talk about it (or blog about it in this case).
Last week’s DC area storms were particularly harsh. Several times throughout the week flash flood watches and warnings were issued, creeks flooded, roads were closed, and downloading ‘arc’ blueprint plans spiked on the internet. Really, we’ve had A LOT of rain.
One morning during a storm, our network monitoring agents alerted Ed Hunter (Manager of the Network Operations Center) that a client’s network was down. This client is part of our managed services group. We usually find it is simply a power or internet issue connectivity issue that is easily resolved. Ed called our client to let them know that there was a problem and we were already troubleshooting the issue. It turned out there was a big problem - the server room was flooded.
At this point Ed notified Brian (Vice President of Network Systems & Support) and Chris (CTO) and set our disaster recovery plan into action. Within 45 minutes Ed was on-site assessing the situation. He could see a ceiling panel had bowed down and buckled from the weight of the water entering from above. The balcony drain on the third floor had clogged, sending water through the doors and the ceiling to the second floor and into the server room.
Because of the unique aspect of the ‘issue’ and critical nature of the outage, Brian and Chris arrived onsite to assist Ed. By 11:00 am they were pulling the servers out of the rack and deconstructing them. Each time a server was removed it drained a few gallons of water. Six servers, representing the organization’s entire network and data, were removed from the rack and drained.
After many hours of drying the servers were powered up again. By 5:00 pm, five out of six servers were operational. However, the MS Exchange server was still down and required a motherboard replacement. We were able to obtain a replacement motherboard and install it by 7:30 pm. Everything was back up and running by 9:00 pm – less than 15 half hours later.
We had two other plans in place which included a virtualized server bank or co-locating their file servers at our data center. We, and they, were happy we didn’t need to go to plan B or C.
While not everyone can emerge from a situation like this quickly and with minimal damage, everyone can take steps to minimize their vulnerability to these types of disasters. It’s important to assess the location of your servers for dangers and try to eliminate them. What’s above your servers? How quickly would you be back up and running?
The bottom line is a significant level of effort went into our managed service offering and seeing the team execute it flawlessly is certainly worth the shout out to our network systems and support team.
Comments