Apologies if you've tried to visit this site over the past week or two and have encountered an error of sorts. The servers that slickhouse is hosted on have been down several times, due to human errors, for once!
The first time, I had setup a new virtual server (2003), equipped with Active Directory and DNS. After testing AD locally on my workstation, I decided to roll it out by connecting each server in turn to the domain. Unfortunately, I forgot about Windows Update, which proceeded to install updates on each server in turn - which was the first stage of the downtime. Once the main host server had restarted, it then performed a chkdisk that I had scheduled some months ago (to begin on the next restart). This took the whole night!
As the host server is a single core affair, the start-up of each virtual server takes a good 30 minutes, to ensure everything is back up and running as before. So that added more downtime, to the already growing 12+ hours.
The second set of downtime was last night - I had moved the anti virus (Symantec AV) from the host server itself, to one of the virtuals. Bad mistake. When the scheduled scan kicked in at 3am, it brought the whole platform down, 4 virtuals and the host. So that was another 18+ hours!
I've since switched the anti virus back to the host server and have opted for Clamwin, a great Open Source program, that works on Windows workstation and server operating systems. If all goes to plan, it will do its stuff tonight without a blip of performance loss.
So, the moral of the story? There are several in this instance:
- Regularly install updates on servers, so that when it comes to a shutdown/restart they won't interfere with things
- Plan downtime a good few hours before you go to bed, in case anything goes wrong in the process
- Use chkdisk startup scans with caution - and remember when you've scheduled them
- Don't break something if it already works, such as moving anti virus duties from a host server to a virtual, when the host was scanning them fine for the past 2 years without a hiccup
- Don't port forward RDP to a virtual server, so that when the platform goes down, you cannot connect externally to restart it all
Be patient if you experience downtime in the near future - at least now you know it's most likely due to a human error on my behalf!