[sheepdog] [PATCH] sheep: fix a epoch mismatch bug
Liu Yuan
namei.unix at gmail.com
Mon May 28 11:53:57 CEST 2012
On 05/28/2012 05:24 PM, Liu Yuan wrote:
> This is a nasty fallout from removing register/un-register group_fd, can be
> observed by following script:
>
> Join a new node while someone left meantime
> ==============================
> for i in 0 1 2; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done
> collie/collie cluster format -c 3
> collie/collie vdi create test0 100M -P
> sleep 1
> for i in 3; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done
> for i in 1; do pkill -f "sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";done;
> ==============================
>
> The culprit is that we failed to inc sys->epoch because the sys stat of the
> newly joined node is SD_STATUS_WAIT_FOR_FORMAT before calling __sd_join_done().
>
> The fix is simple, adding a new status to indicate that "I'm already joined,
> though need update other states, I'm still capable of recovering"
Hmm, this patch bring regression, can't handle multiple nodes joining
correctly. I'm still poking around what causes the problem.
Thanks,
Yuan
More information about the sheepdog
mailing list