[sheepdog] [PATCH v3] corosync: fix cluster hang by cluster requests blocking confchg

Thu Jul 5 13:57:19 CEST 2012

At Thu,  5 Jul 2012 19:36:04 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly at taobao.com>
> 
> v3:
>  - corosync only sends node event one by one, so we can just add the leave event
>    to the head
> ------------------------------------------------ >8
> 
> This hang is caused by cluster request (add new vdi):
> 
> 1) cluster request blocks the cluster and wait its worker to finish.
> 2) a confchg happens, but is queued after this cluster request.
> 3) cluster_request_fn() issues write request but always fail because of one
>    node failure and retry for ever.
> 4) cluster_request_done() is never called, so we can't unblock the event list
> 
> this can be reprodced reliably by following script:
> ================
> 
> for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done
> sleep 1
> collie/collie cluster format  -c 3
> echo create new vdis
> (
> for i in `seq 0 40`;do
> collie/collie vdi create test$i 4M
> done
> ) &
> 
> echo kill nodes
> sleep 1
> for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 1;done;
> 
> for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done
> 
> echo wait for object recovery to finish
> for ((;;)); do
>         if [ "$(pgrep collie)" ]; then
>                 sleep 1
>         else
>                 break
>         fi
> done
> =================
> 
> The fix tries to add leave confchg to the head of event list. join confchg is
> untouched.
> 
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
>  sheep/cluster/corosync.c |   10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
> index 330cb71..ca737b9 100644
> --- a/sheep/cluster/corosync.c
> +++ b/sheep/cluster/corosync.c
> @@ -198,8 +198,8 @@ retry:
>  	return 0;
>  }
>  
> -static struct corosync_event *find_event(enum corosync_event_type type,
> -		struct cpg_node *sender)
> +static inline struct corosync_event *find_event(enum corosync_event_type type,
> +						struct cpg_node *sender)
>  {
>  	struct corosync_event *cevent;
>  
> @@ -561,7 +561,11 @@ static void cdrv_cpg_confchg(cpg_handle_t handle,
>  		cevent->type = COROSYNC_EVENT_TYPE_LEAVE;
>  		cevent->sender = left_sheep[i];
>  
> -		list_add_tail(&cevent->list, &corosync_event_list);
> +		/*
> +		 * Leave event would possibly be blocked by cluster request
> +		 * so we add it to the head of event list
> +		 */
> +		list_add(&cevent->list, &corosync_event_list);

Corosync delivers confchg events even if there remain the previous
events in corosync_event_list, doesn't it?  Adding events to the head
of the queue breaks the guarantee that all the nodes must receive
confchg events in the same order, I think.

Thanks,

Kazutaka

>  	}
>  
>  	/* dispatch join_handler */
> -- 
> 1.7.10.2
> 
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog