We have a CentOS 6/7 environment that consists of approximately 170 workstations and about 280 servers. Each of these machines is hard mounting an shared /home for users to store code, shell scripts, and general user stuff.
Lately we have been having issues with users unintentionally kicking of huge file transfers to /home or doing a build and dumping to /home. This is driving the disk IO through the roof on the shared server, and creating unresponsive systems during that time. While some of this is brought about by older hardware, I would like to avoid always upgrading hardware to temporarily solve the problem and instead look towards finding a real solution.
Here are my questions: Is there a way to set threshold limits on IO for /home on a per user basis? Is this something that should even be considered, or going down the wrong path?
Is there a way to load balance this nfs traffic between multiple servers to reduce IO to a single host?
Are other linux shops providing a shared /home to all machines?
I don't know what I don't know on this issue, and I feel like there must be an alternative I am unaware of.
[link][11 comments]