We currently have a mixture of boxes in our DC's and the hardware management controllers are lacking, to say the least (supermicro's managment controller monitoring is especially shit)
I want to improve the hardware monitoring and I've implemented at the following tools:
megactl - RAID controller and array monitoring (working well) mcelog - for machine check errors (seems to be a bit flaky)
I've also looked at edac-utils, but it seems my chipset isn't supported.
Are there any other tools (RHEL or CentOS based ideally) that will help us increase our hardware proactivity?
[link][7 comments]