[sheepdog] [PATCH v3] corosync: fix cluster hang by cluster requests blocking confchg
MORITA Kazutaka
morita.kazutaka at gmail.com
Thu Jul 5 13:57:19 CEST 2012
At Thu, 5 Jul 2012 19:36:04 +0800,
Liu Yuan wrote:
>
> From: Liu Yuan <tailai.ly at taobao.com>
>
> v3:
> - corosync only sends node event one by one, so we can just add the leave event
> to the head
> ------------------------------------------------ >8
>
> This hang is caused by cluster request (add new vdi):
>
> 1) cluster request blocks the cluster and wait its worker to finish.
> 2) a confchg happens, but is queued after this cluster request.
> 3) cluster_request_fn() issues write request but always fail because of one
> node failure and retry for ever.
> 4) cluster_request_done() is never called, so we can't unblock the event list
>
> this can be reprodced reliably by following script:
> ================
>
> for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done
> sleep 1
> collie/collie cluster format -c 3
> echo create new vdis
> (
> for i in `seq 0 40`;do
> collie/collie vdi create test$i 4M
> done
> ) &
>
> echo kill nodes
> sleep 1
> for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 1;done;
>
> for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done
>
> echo wait for object recovery to finish
> for ((;;)); do
> if [ "$(pgrep collie)" ]; then
> sleep 1
> else
> break
> fi
> done
> =================
>
> The fix tries to add leave confchg to the head of event list. join confchg is
> untouched.
>
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
> sheep/cluster/corosync.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
> index 330cb71..ca737b9 100644
> --- a/sheep/cluster/corosync.c
> +++ b/sheep/cluster/corosync.c
> @@ -198,8 +198,8 @@ retry:
> return 0;
> }
>
> -static struct corosync_event *find_event(enum corosync_event_type type,
> - struct cpg_node *sender)
> +static inline struct corosync_event *find_event(enum corosync_event_type type,
> + struct cpg_node *sender)
> {
> struct corosync_event *cevent;
>
> @@ -561,7 +561,11 @@ static void cdrv_cpg_confchg(cpg_handle_t handle,
> cevent->type = COROSYNC_EVENT_TYPE_LEAVE;
> cevent->sender = left_sheep[i];
>
> - list_add_tail(&cevent->list, &corosync_event_list);
> + /*
> + * Leave event would possibly be blocked by cluster request
> + * so we add it to the head of event list
> + */
> + list_add(&cevent->list, &corosync_event_list);
Corosync delivers confchg events even if there remain the previous
events in corosync_event_list, doesn't it? Adding events to the head
of the queue breaks the guarantee that all the nodes must receive
confchg events in the same order, I think.
Thanks,
Kazutaka
> }
>
> /* dispatch join_handler */
> --
> 1.7.10.2
>
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
More information about the sheepdog
mailing list