Hi folks,
I'm running a set of mail servers on centos5, with a 32-bit PAE kernel I'm stuck with due to vendor requirements.
Whenever I bring the kernel version past 2.6.18-238, I start getting OOM killing a variety of processes, despite there being over 10GB of free memory and zero swap used.
The original output of dmesg's oom is:
http://paste.linuxassist.net/view/c116d5f0
I noticed that the lowmem was being mostly exhausted (1-3MB free at times) and tried aggressively raising vm.lowmem_reserve_ratio and vm.min_free_kbytes, to 256 256 250, and 100,000 accordingly (increased in stages until I reached that level). Watching /proc/meminfo during an OOM shows ~150MB of lowmem free during this time with OOMs still occurring, so I think I can rule out lowmem exhaustion.
After tweaking those values, the dmesg output for an OOM is:
http://paste.linuxassist.net/view/e0cbbd61
Does anyone have any thoughts on this? I'm at a loss for what is causing it. Suggestions on upgrading to 64bit, newer versions of centos, etc are non-viable due to vendor specific limitations.
Specifically I'm looking for an analysis of the dmesg output, which should indicate the problem - I'm just not quite sure how to interpret the memory exhaustion shown.
[link][6 comments]