[sheepdog] leave event does not dispached in a corosync driver.

tuji tuji at atworks.co.jp
Fri Sep 12 06:00:58 CEST 2014


Hi

I found problem that the node does not left when one of the node is stoped under
recovery is running
And it was repported to launchpad(https://bugs.launchpad.net/sheepdog-project/+bug/1368503 ).

To solve this problem, I make patche for corosyn.c

[root at node001 BUILD]# diff -u sheepdog-0.7.6-org/sheep/cluster/corosync.c sheepdog-0.7.6/sheep/cluster/corosync.c
--- sheepdog-0.7.6-org/sheep/cluster/corosync.c 2013-12-22 18:07:34.000000000 +0900
+++ sheepdog-0.7.6/sheep/cluster/corosync.c     2014-09-12 09:47:37.840975169 +0900
@@ -368,8 +368,9 @@
                 * number of alive nodes correctly, we postpone
                 * processsing events if there are incoming ones.
                 */
-               sd_debug("wait for a next dispatch event");
-               return;
+               sd_debug("wait for a next dispatch event.not return");
+               //sd_debug("wait for a next dispatch event");
+               //return;
        }

        nr_majority = 0;

The problem was solved by this patch.
I know this is an insufficiency patch because the function described in comment is disabled.

                /*
                 * Corosync dispatches leave events one by one even
                 * when network partition has occured.  To count the
                 * number of alive nodes correctly, we postpone
                 * processsing events if there are incoming ones.
                 */

I can't understand about this comment.
Does anyone give me advice about it.



--------------------------
Masahiro Tsuji

A.T.WORKS, INC
URL http://www.atworks.co.jp




More information about the sheepdog mailing list