I have a setup of 2-nodes RHEL VMs cluster serving an shared storage using DRBD and NFSv4 and a floating IP serving 2 multi instances WMQ nodes , the performance part is OK using EXT4 however in some failover scenarios am not getting expected results like for example :
1- when crashing cluster active node the primary WMQ node doesnt seem to just reconnect to secondary node and instead the mount stales and it load goes sky rocket , is it the NFS sessions not getting cleared/transferred correctly ? I tried moving /avr/lib/nfs to one of the served DRBD mounts to get the sessions transferred to 2nd NFS cluster node but no luck
2- Sometimes I see corruption to the server files while crashing nodes , any recommendation to avoid corruption as possible ? tried mounting the NFS with sync instead of async but this is a performance killer for WMQ so its not an option .
3- QDISK seems annoying in some situations , any experience with setting up the cluster without it , perhaps introducing a fencing delay to one of the nodes, any experience with that .
[link][20 comments]