The server has very high IO wait when it's busy. I already know that's a problem. I have used a combination of iostat
, iotop
, ps
and strace
to determine that the io wait is from streaming video content. (high traffic porn sites on this server) Could there be a correlation between the high disk IO and apache pids suddenly going defunct and needing to be manually killed?
I've been under the assumption that there is no correlation between httpd zombies and high disk IO so far since I haven't found one.
Either way, what's the best way to really dig into this issue (the httpd zombies) and find out what's going on? The io issue I can 'fix' by throwing hardware at the problem. But the sudden httpd zombies I can't figure out. Even if it's a code issue, I would like to find out what section of code specifically (if possible) is responsible.
Any ideas on how I might go about this? Or at least narrow it down? Crazy ideas welcome.
[link][15 comments]