Hey guys,
I ran into a really weird L2/L3 issue this week which I'm hoping someone can shed some light on for me. I've got a decent grasp on networking in general but less so on the Linux side, and the behavior I'm seeing is simply baffling me.
I've got two servers: server A and server B. At some point during a deploy process we are running, server B wgets a file from server A. We notice that the job is dying and start to investigate.
Both servers reside on the same subnet, a /24. Server B can ping server A, the GW, etc. Server A can't ping server B, but can ping the GW. I notice that when I try to ping server B from server A, an ARP entry gets created, so something is going on. Dig a little deeper and try an arping, and it looks like layer 2 is working fine. Then, like magic, after I run the arping from server A to server B, L3 suddenly starts working, I can now ping server B, and the wget works fine.
Any idea what's going on here, and why running an arping (which I understand is strictly L2) is magically fixing a L3 networking issue?
EDIT: Figured I'd throw this in here... these boxes are running RedHat 5.10 with the 2.6.18 kernel
[link][8 comments]