Hello all,
I've been having an issue with one of our HA clusters where the only service is just an ip.
A few months ago the service was disabled for maintenance purposes and since then I'm unable to enable the service again.
Cluster Status for FTP @ Fri Feb 19 14:33:25 2016 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clu-ftp1 1 Online, Local, rgmanager clu-ftp2 2 Online, rgmanager clu-script 3 Online, rgmanager /dev/block/120:0 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:FTPIP (clu-ftp1) disabled clusvcadm -e FTPIP Local machine trying to enable service:FTPIP...Failure
There are 2 servers where the service can run and then there is a 3rd server which is only part of the cluster so that it can access the shared storage.
<?xml version="1.0"?> <cluster config_version="68" name="FTP"> <clusternodes> <clusternode name="clu-ftp1" nodeid="1"> <fence> <method name="Method1"> <device name="ftp1-ilo-fence"/> </method> </fence> </clusternode> <clusternode name="clu-ftp2" nodeid="2"> <fence> <method name="Method2"> <device name="ftp2-ilo-fence"/> </method> </fence> </clusternode> <clusternode name="clu-scripts" nodeid="3"/> </clusternodes> <cman expected_votes="3" quorum_dev_poll="250000"/> <fence_daemon post_fail_delay="20" post_join_delay="20"/> <logging logfile="/var/log/cluster/cluster.log"/> <totem consensus="10000" join="120" token="250000"/> <quorumd device="/dev/emcpowera" interval="5" label="lock_disk" min_score="1" tko="24" votes="1"> <heuristic interval="5" program="touch /quorum" tko="5"/> <heuristic interval="5" program="/usr/sbin/clustat | grep 'Online, Quorum Disk'" tko="5"/> </quorumd> <rm> <failoverdomains> <failoverdomain name="FTP1" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="ftp1" priority="1"/> <failoverdomainnode name="ftp2" priority="2"/> </failoverdomain> <failoverdomain name="FTP2" nofailback="1" ordered="1" restricted="1"> <failoverdomainnode name="ftp1" priority="2"/> <failoverdomainnode name="ftp2" priority="1"/> </failoverdomain> </failoverdomains> <service domain="FTP1" name="FTPIP" recovery="relocate"> <ip address="x.x.x.x" monitor_link="on" sleeptime="10"/> </service> </rm> <fencedevices> <fencedevice agent="fence_ipmilan" ipaddr="y.y.y.y" lanplus="on" login="fence" name="ftp1-ilo-fence" passwd="xxx"/> <fencedevice agent="fence_ipmilan" ipaddr="z.z.z.z" lanplus="on" login="fence" name="ftp2-ilo-fence" passwd="xxx"/> </fencedevices> </cluster>
Nothing is showing up in any the logs... I can start the service successfully running:
rg_test test /etc/cluster/cluster.conf start service FTPIP Running in test mode. Loading resource rule from /usr/share/cluster/oracledb.sh Loading resource rule from /usr/share/cluster/orainstance.sh Loading resource rule from /usr/share/cluster/service.sh Loading resource rule from /usr/share/cluster/checkquorum Loading resource rule from /usr/share/cluster/oralistener.sh Loading resource rule from /usr/share/cluster/nfsclient.sh Loading resource rule from /usr/share/cluster/ocf-shellfuncs Loading resource rule from /usr/share/cluster/lvm_by_lv.sh Loading resource rule from /usr/share/cluster/postgres-8.sh Loading resource rule from /usr/share/cluster/ASEHAagent.sh Loading resource rule from /usr/share/cluster/SAPDatabase Loading resource rule from /usr/share/cluster/fence_scsi_check.pl Loading resource rule from /usr/share/cluster/tomcat-6.sh Loading resource rule from /usr/share/cluster/samba.sh Loading resource rule from /usr/share/cluster/ip.sh Loading resource rule from /usr/share/cluster/nfsserver.sh Loading resource rule from /usr/share/cluster/clusterfs.sh Loading resource rule from /usr/share/cluster/netfs.sh Loading resource rule from /usr/share/cluster/named.sh Loading resource rule from /usr/share/cluster/fence_scsi_check_hardreboot.pl Loading resource rule from /usr/share/cluster/vm.sh Loading resource rule from /usr/share/cluster/lvm_by_vg.sh Loading resource rule from /usr/share/cluster/svclib_nfslock Loading resource rule from /usr/share/cluster/script.sh Loading resource rule from /usr/share/cluster/lvm.sh Loading resource rule from /usr/share/cluster/SAPInstance Loading resource rule from /usr/share/cluster/openldap.sh Loading resource rule from /usr/share/cluster/apache.sh Loading resource rule from /usr/share/cluster/mysql.sh Loading resource rule from /usr/share/cluster/fs.sh Loading resource rule from /usr/share/cluster/bind-mount.sh Loading resource rule from /usr/share/cluster/nfsexport.sh Starting FTPIP... <debug> Link for bond0: Detected [ip] Link for bond0: Detected <info> Adding IPv4 address x.x.x.x/x to bond0 [ip] Adding IPv4 address x.x.x.x/x to bond0 <debug> Pinging addr x.x.x.x from dev bond0 [ip] Pinging addr x.x.x.x from dev bond0 <debug> Sending gratuitous ARP: x.x.x.x y:y:y:y:y:y brd ff:ff:ff:ff:ff:ff [ip] Sending gratuitous ARP: x.x.x.x y:y:y:y:y:y brd ff:ff:ff:ff:ff:ff rdisc: no process killed Start of FTPIP complete
Then the service is running successfully although it is still showing as disabled when running clustat. And the service doesn't fail over automatically now...
Anyone got any ideas? I've been googling a lot trying to fix this, but nothing seems to be working.
Thanks in advance...
[link] [comments]