I've grown the web hosting position at our central IT department from just me managing shared webhosting to a three-person web hosting team that runs services from shared webhosting through dedicated virtual machines up to high-volume, load-balanced custom hosting and supporting a variety of languages and stacks. I now find myself in the happy position of having the free time to really work on getting supporting infrastructure in place to streamline our services, workflow, uptime, etc..
We already use puppet to manage our servers and virtual hosts, and we have a monitoring team that does monitoring and trending for us. I'm going to get the web logs centralized and regularly monitored (we do this for OS logs already). Maybe analytics to help decide when hosts need to be shuffled around or have more resources added. Maybe automated auditing of some kind. We have backups, but I'm sure we could do more with them.
If you suddenly had the free time to improve your stuff rather than just handle tickets, what would you do? What other supporting services do you folks use to make your stuff more stable, easier to manage, faster to restore, etc.?
[link][4 comments]