Quantcast
Channel: linuxadmin: Expanding Linux SysAdmin knowledge
Viewing all articles
Browse latest Browse all 17886

At my wit's end. NIC issue that I can't seem to track down. Advise needed!

$
0
0

I want to start this off stating that I am by NO MEANS an expert Linux Administrator. I am learning and attempting to set up a home environment for my RHCSA/RHCE studies.

I have a server that I bought as a refurb from a fairly credible company. I am using this server through it's eth0 port. Server's IP is 172.20.20.2/24. I have KVM installed on the server so on the commands I include in my text dump, you will see some 192.168.1xx.xxx/24 virtual ports, etc.

Now, from my test machine (workstation 172.20.20.5) I am able to SSH and VNC to the machine when it wants to play nicely. I have a gig port from the .5 workstation, to a gig switch directly plugged into eth0 on the server. It may be wise to note this server is running CentOS 6.4. I also have an ESX host on this switch that runs completely fine.

Basically the issue is that randomly, the server starts refusing connection through SSH and VNC. It could be immediately after power on, intermittently fading in and out, etc. I can be on VNC one moment, "connection" refused the next, and then 5 minutes later back connected.

A ping from 172.20.20.5 workstation shows spotty latency during the times where connections are failing. However it will randomly begin to function and have a perfect LAN latency >1ms. During the perfect latency, things will connect. Then when it decides to derp, things stop connecting. I sniffed the packets before and during the issue and didn't see anything particularly of note. I also ran a tcpdump on the server, which I can host somewhere if need be.

I have made a text file dump of various commands showing the device, driver, configurations, etc.

http://textuploader.com/?p=6&id=lEFWW

Please let me know what else could be needed to help troubleshoot. This is driving me up the wall. I have updated the NIC driver bnx2 to the most recent update from the Broadcom site. I have disabled iptables and selinux completely. I have allowed the eth ports as trusted. I have turned off Network Manager. I messed with IP forwarding so that my KVM machines could connect but it had no part in whether or not the spikes would happen.

My last thoughts are that it may be a hardware issue. There is another onboard NIC i could attempt to test with and will likely do that when I am off of work...Until then can anyone offer any advise or point me in the right direction?

ANY help is greatly appreciated. I've been bashing my head at this for a while now and just want to be able to roll out my RHCSA labs!

Thank you!

EDIT: There is also a cron job I show that I set up hoping a ping from inside could keep things alive, but it also did not help.


@ ALL

OH MY GOD I MIGHT BE RETARDED. Stand by.

EDIT2:

YUP! Think I found the issue and I am -in fact- retarded.

Looks like when I set the 172.20.20.2 address on the server, I wasn't expecting my DHCP pool on my DHCP server to be using that address space (for some reason was thinking 100-150 was what I had given it to use). Turns out .2 was being taken by one of the damn cell phones in the house, hence google talk entires. When the server lost connection it was going apeshit with ARPs because of the IP address issue, and was ARPing itself to that IP when it was being used by the phone. I noted that when I rebooted the server, I lost connection to the IP for a brief second, then it started going apeshit (wireless phone 2 floors up somewhere) and then after the server came fully online, it would be clean for a while -- presumably once it had kicked off the phone.

TLDR: Duplicate IP address > me. I can't believe I completely overlooked that shit. I am ashamed to call myself a CCNP.

TIL: LOOK AT THE FUCKING MAC ADDRESSES!

submitted by rexis89
[link][6 comments]

Viewing all articles
Browse latest Browse all 17886

Trending Articles