Pacemaker/Corosync, DRBD on ESX virtual machines issues

Hello everyone, I've been struggling with the above for a while now and eventually decided to look for help on Reddit.

I have 2 VMs, /home partition is managed by DRBD, on top of that I have a Pacemaker and Corosync listening on a heartbeat vlan.

Once in a while - every few weeks, sometimes months I get an email about split brains and thats what I find in logs:

Aug 5 23:38:07 fs0 kernel: [68355583.821768] block drbd0: sock was shut down by peer Aug 5 23:38:07 fs0 kernel: [68355583.821778] block drbd0: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Aug 5 23:38:07 fs0 kernel: [68355583.842257] block drbd0: Creating new current UUID Aug 5 23:38:07 fs0 kernel: [68355583.866926] block drbd0: asender terminated Aug 5 23:38:07 fs0 kernel: [68355583.866933] block drbd0: Terminating drbd0_asender Aug 5 23:38:07 fs0 kernel: [68355584.507196] block drbd0: Connection closed Aug 5 23:38:07 fs0 kernel: [68355584.507226] block drbd0: conn( BrokenPipe -> Unconnected ) Aug 5 23:38:07 fs0 kernel: [68355584.507234] block drbd0: receiver terminated Aug 5 23:38:07 fs0 kernel: [68355584.507237] block drbd0: Restarting drbd0_receiver Aug 5 23:38:07 fs0 kernel: [68355584.507240] block drbd0: receiver (re)started Aug 5 23:38:07 fs0 kernel: [68355584.507245] block drbd0: conn( Unconnected -> WFConnection ) Aug 5 23:38:08 fs0 kernel: [68355584.612101] block drbd0: Handshake successful: Agreed network protocol version 91 Aug 5 23:38:08 fs0 kernel: [68355584.612111] block drbd0: conn( WFConnection -> WFReportParams ) Aug 5 23:38:08 fs0 kernel: [68355584.612140] block drbd0: Starting asender thread (from drbd0_receiver [4346]) Aug 5 23:38:08 fs0 kernel: [68355584.615955] block drbd0: data-integrity-alg: <not-used> Aug 5 23:38:08 fs0 kernel: [68355584.615973] block drbd0: drbd_sync_handshake: Aug 5 23:38:08 fs0 kernel: [68355584.615977] block drbd0: self 9C6012EC1D3A23DF:F838BBE1953EC251:38E221DC90C8B46B:2D25DFD8EF4F70D7 bits:1 flags:0 Aug 5 23:38:08 fs0 kernel: [68355584.615982] block drbd0: peer EA40E1F0BB878A3F:F838BBE1953EC250:38E221DC90C8B46A:2D25DFD8EF4F70D7 bits:0 flags:0 Aug 5 23:38:08 fs0 kernel: [68355584.615986] block drbd0: uuid_compare()=100 by rule 90 Aug 5 23:38:08 fs0 kernel: [68355584.621228] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Aug 5 23:38:11 fs0 kernel: [68355584.644234] block drbd0: conn( WFReportParams -> NetworkFailure ) Aug 5 23:38:11 fs0 kernel: [68355584.644369] block drbd0: asender terminated Aug 5 23:38:11 fs0 kernel: [68355584.644377] block drbd0: Terminating drbd0_asender Aug 5 23:38:12 fs0 notify-split-brain.sh[21452]: invoked for r0 Aug 5 23:38:12 fs0 kernel: [68355586.148682] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Aug 5 23:38:12 fs0 kernel: [68355586.148703] block drbd0: conn( NetworkFailure -> Disconnecting ) Aug 5 23:38:12 fs0 kernel: [68355586.149526] block drbd0: Connection closed Aug 5 23:38:12 fs0 kernel: [68355586.149537] block drbd0: conn( Disconnecting -> StandAlone ) Aug 5 23:38:12 fs0 kernel: [68355586.149751] block drbd0: receiver terminated Aug 5 23:38:12 fs0 kernel: [68355586.149754] block drbd0: Terminating drbd0_receiver

Looks like connection between nodes timed out (network was up all the time on both hosts) ? Failure occured moments after VM was migrated by a vMotion. I was told that during vMotion some packets may be lost. Is that the case?

Is there a Corosync setting I could tweak to mitigate those issues? TTL in Corosync config is set to 1 at the moment.

Thanks for your help!

submitted by SysadminOfThings
[link][24 comments]

Pacemaker/Corosync, DRBD on ESX virtual machines issues - Split brains

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List