Quantcast
Channel: linuxadmin: Expanding Linux SysAdmin knowledge
Viewing all articles
Browse latest Browse all 17759

RHEL 6.7 + Tomcat6 Clustering

$
0
0

Trying to roll out a webapp that we have clustered. The issue that i'm running into is that I'm seeing both nodes leave the cluster, and never come back.

OS: RHEL 6.7 Java: OpenJDK-1.7.0_95 Tomcat: Tomcat 6.0.24 (version that is shipped with RHEL).

Scenario:

  • Node A comes up and attempts to start the cluster. I let the webapp fully deploy before attempting to bring up Node B.
  • Node B comes up, and I can see in the logs on Node A that Node B has joined the cluster and vice versa.
  • After about 3 mins, I see this error on both nodes:

From Node A:

Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 6}:4001,{172, 16, 17, 6},4001, alive=215620,id={-58 -120 -100 45 114 46 65 -65 -79 72 75 32 -77 -50 -55 -64 }, payload={}, command={}, domain={}, ]] message. Will verify. Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 6}:4001,{172, 16, 17, 6},4001, alive=215620,id={-58 -120 -100 45 114 46 65 -65 -79 72 75 32 -77 -50 -55 -64 }, payload={}, command={}, domain={}, ]] 

From Node B:

Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 4}:4001,{172, 16, 17, 4},4001, alive=188567,id={-108 49 70 30 -44 -105 73 -12 -69 50 -69 60 -28 123 -126 -39 }, payload={}, command={}, domain={}, ]] message. Will verify. Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 4}:4001,{172, 16, 17, 4},4001, alive=188567,id={-108 49 70 30 -44 -105 73 -12 -69 50 -69 60 -28 123 -126 -39 }, payload={}, command={}, domain={}, ]] 

From what I understand the "memberDisappeared" error followed by the "Verification Complete. Member Still Alive" is supposed to be a false positive in this scenario. If I let the nodes sit like this long enough, they will stop communicating all together and services will need to be restarted. Example: I shutdown Node B. I see Node B leave the cluster on Node A. If I let Node A sit long enough (5 or 10 mins), and attempt to start Node B and bring it back into the Cluster, Node A will never see Node B join, but Node B will see Node A join.

netstat -g output on both servers:

IPv6/IPv4 Group Memberships Interface RefCnt Group --------------- ------ --------------------- lo 1 all-systems.mcast.net eth1 1 239.255.0.2 eth1 1 all-systems.mcast.net 

Both servers appear to be in the proper Multicast group. I used a utility found on the RHEL Support site (multicast.py), that allows you to stand up a multicast server and client with the same utility. When I stand up the server side, and then enable the client on the other node, I can see the multicast "Hellos" coming from the client to the server. I never see it drop any multicast packets in this test.

server.xml from both nodes:

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="239.255.0.2" port="45565" frequency="500" dropTime="3000" mcastTTL="1"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4001" autoBind="100" selectorTimeout="5000" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> 

I'm really at a loss here as to where to go next in my troubleshooting steps. I've consulted my Network team about the issue, and we have many other services up in our DC that use multicast for cluster heartbeats and they have advised that those services required no special configurations from our Network team. I wanted to reach out to the community to see if there was something simple that I am missing before I go ahead and open a support case with Red Hat.

submitted by /u/suntzu420
[link] [comments]

Viewing all articles
Browse latest Browse all 17759

Trending Articles