[sheepdog] [PATCH v3] corosync: fix cluster hang by cluster requests blocking confchg

Liu Yuan namei.unix at gmail.com
Thu Jul 5 13:36:04 CEST 2012


From: Liu Yuan <tailai.ly at taobao.com>

v3:
 - corosync only sends node event one by one, so we can just add the leave event
   to the head
------------------------------------------------ >8

This hang is caused by cluster request (add new vdi):

1) cluster request blocks the cluster and wait its worker to finish.
2) a confchg happens, but is queued after this cluster request.
3) cluster_request_fn() issues write request but always fail because of one
   node failure and retry for ever.
4) cluster_request_done() is never called, so we can't unblock the event list

this can be reprodced reliably by following script:
================

for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done
sleep 1
collie/collie cluster format  -c 3
echo create new vdis
(
for i in `seq 0 40`;do
collie/collie vdi create test$i 4M
done
) &

echo kill nodes
sleep 1
for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 1;done;

for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i -z $i -p $((7000+$i));done

echo wait for object recovery to finish
for ((;;)); do
        if [ "$(pgrep collie)" ]; then
                sleep 1
        else
                break
        fi
done
=================

The fix tries to add leave confchg to the head of event list. join confchg is
untouched.

Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
 sheep/cluster/corosync.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
index 330cb71..ca737b9 100644
--- a/sheep/cluster/corosync.c
+++ b/sheep/cluster/corosync.c
@@ -198,8 +198,8 @@ retry:
 	return 0;
 }
 
-static struct corosync_event *find_event(enum corosync_event_type type,
-		struct cpg_node *sender)
+static inline struct corosync_event *find_event(enum corosync_event_type type,
+						struct cpg_node *sender)
 {
 	struct corosync_event *cevent;
 
@@ -561,7 +561,11 @@ static void cdrv_cpg_confchg(cpg_handle_t handle,
 		cevent->type = COROSYNC_EVENT_TYPE_LEAVE;
 		cevent->sender = left_sheep[i];
 
-		list_add_tail(&cevent->list, &corosync_event_list);
+		/*
+		 * Leave event would possibly be blocked by cluster request
+		 * so we add it to the head of event list
+		 */
+		list_add(&cevent->list, &corosync_event_list);
 	}
 
 	/* dispatch join_handler */
-- 
1.7.10.2




More information about the sheepdog mailing list