[sheepdog] epoch number mismatch
tuji
tuji at atworks.co.jp
Thu Jul 17 13:25:33 CEST 2014
Hi, I've been reporting this bug
https://bugs.launchpad.net/sheepdog-project/+bug/1329806
I find what is happning sheepdog and corosync.
But I have no idea how to avoid.
for Node104
There is one "A new membership was formed" message.it is include same joined and left nodes.
And There is no "cdrv_cpg_confchg" in sheepdog.log
-----corosync.log-------
Jul 14 10:25:26 [2880] node104 corosync notice [TOTEM ] A new membership (10.0.1.1:206200) was formed. Members joined: 167772417 167772418 167772419 167772420 left: 167772417 167772418 167772419 167772420
-----sheepdog.log-------
none
for Node001
There is two "A new membership was formed" lines.
And there are two "cdrv_cpg_confchg" in sheepdog.log(left and joined).
-----corosync.log-------
Jul 14 10:25:26 [31893] node001 corosync notice [TOTEM ] A new membership (10.0.1.1:206196) was formed. Members left: 167772520
Jul 14 10:25:26 [31893] node001 corosync notice [TOTEM ] A new membership (10.0.1.1:206200) was formed. Members joined: 167772520
-----sheepdog.log-------
Jul 14 10:25:26 DEBUG [main] cdrv_cpg_confchg(553) mem:4, joined:0, left:1
Jul 14 10:25:26 DEBUG [main] cdrv_cpg_confchg(553) mem:5, joined:1, left:0
The Last cdrv_cpg_confchg event with joined:1 was not generated sheepdog.it was generated by corosync.
And Node001 never recive COROSYNC_MSG_TYPE_JOIN message subsequently cdrv_cpg_confchg.
So cevent->msg is not filled in __corosync_dispatch_one.
-------------------------------------------
static bool __corosync_dispatch_one(struct corosync_event *cevent)
{
struct sd_node entries[SD_MAX_NODES], *node;
struct cpg_node *n;
int idx;
switch (cevent->type) {
case COROSYNC_EVENT_TYPE_JOIN:
if (!cevent->msg)
/* we haven't receive JOIN yet */
return false;
-------------------------------------------
--------------------------
Masahiro Tsuji
A.T.WORKS, INC
URL http://www.atworks.co.jp
More information about the sheepdog
mailing list