RHEL 6.7 + Tomcat6 Clustering

Trying to roll out a webapp that we have clustered. The issue that i'm running into is that I'm seeing both nodes leave the cluster, and never come back.

OS: RHEL 6.7 Java: OpenJDK-1.7.0_95 Tomcat: Tomcat 6.0.24 (version that is shipped with RHEL).

Scenario:

Node A comes up and attempts to start the cluster. I let the webapp fully deploy before attempting to bring up Node B.
Node B comes up, and I can see in the logs on Node A that Node B has joined the cluster and vice versa.
After about 3 mins, I see this error on both nodes:

From Node A:

Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 6}:4001,{172, 16, 17, 6},4001, alive=215620,id={-58 -120 -100 45 114 46 65 -65 -79 72 75 32 -77 -50 -55 -64 }, payload={}, command={}, domain={}, ]] message. Will verify. Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 6}:4001,{172, 16, 17, 6},4001, alive=215620,id={-58 -120 -100 45 114 46 65 -65 -79 72 75 32 -77 -50 -55 -64 }, payload={}, command={}, domain={}, ]]

From Node B:

Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 4}:4001,{172, 16, 17, 4},4001, alive=188567,id={-108 49 70 30 -44 -105 73 -12 -69 50 -69 60 -28 123 -126 -39 }, payload={}, command={}, domain={}, ]] message. Will verify. Mar 10, 2016 9:17:36 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{172, 16, 17, 4}:4001,{172, 16, 17, 4},4001, alive=188567,id={-108 49 70 30 -44 -105 73 -12 -69 50 -69 60 -28 123 -126 -39 }, payload={}, command={}, domain={}, ]]

From what I understand the "memberDisappeared" error followed by the "Verification Complete. Member Still Alive" is supposed to be a false positive in this scenario. If I let the nodes sit like this long enough, they will stop communicating all together and services will need to be restarted. Example: I shutdown Node B. I see Node B leave the cluster on Node A. If I let Node A sit long enough (5 or 10 mins), and attempt to start Node B and bring it back into the Cluster, Node A will never see Node B join, but Node B will see Node A join.

netstat -g output on both servers:

IPv6/IPv4 Group Memberships Interface RefCnt Group --------------- ------ --------------------- lo 1 all-systems.mcast.net eth1 1 239.255.0.2 eth1 1 all-systems.mcast.net

Both servers appear to be in the proper Multicast group. I used a utility found on the RHEL Support site (multicast.py), that allows you to stand up a multicast server and client with the same utility. When I stand up the server side, and then enable the client on the other node, I can see the multicast "Hellos" coming from the client to the server. I never see it drop any multicast packets in this test.

server.xml from both nodes:

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="239.255.0.2" port="45565" frequency="500" dropTime="3000" mcastTTL="1"/> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="auto" port="4001" autoBind="100" selectorTimeout="5000" maxThreads="6"/> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster>

I'm really at a loss here as to where to go next in my troubleshooting steps. I've consulted my Network team about the issue, and we have many other services up in our DC that use multicast for cluster heartbeats and they have advised that those services required no special configurations from our Network team. I wanted to reach out to the community to see if there was something simple that I am missing before I go ahead and open a support case with Red Hat.

submitted by /u/suntzu420
[link] [comments]

RHEL 6.7 + Tomcat6 Clustering

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List