[Sheepdog] [PATCH] sheep: bug in event_done leads to dead lock

Liu Yuan namei.unix at gmail.com
Fri Apr 27 17:16:11 CEST 2012


On 04/27/2012 09:15 PM, Yunkai Zhang wrote:

> From: Yunkai Zhang <qiushu.zyk at taobao.com>
> 
> Dead lock was found in the following scenario:
> 
> Suppose that there are two sheeps: S1, S2, and their event_queues
> are empty.
> 
> Now S1 received a notify message: M1, and call sd_notify_handler()
> which will add notify event to its event_queue and than call
> process_request_event_queues() to queue_work this event.
> 
> At the same time, S2 send a notify message: M2 to cluster and an
> I/O request(eg. do_lookup_vdi operation) was submitted to S1 when
> S2 calls zk_dispatch() to handle M2.
> 
> After S1 received I/O request from S2, it would finally call
> process_request_event_queues() to deal with this event, if S1 call
> this function before M1's event_done() finished, this I/O request
> would not to be processed for the event_queue was not empty. This
> problem leads to dead lock between S1 and S2, S2 would be blocked
> in read() waitting for the data responsed by S1, and the whole cluster
> would be suspended forever.
> 
> To fix this problem, we just modify the code in event_done, so that
> it can process request_queue after event_queue is empty.
> 
> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
> ---
>  sheep/group.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/sheep/group.c b/sheep/group.c
> index b4cf2da..7e19d33 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -964,8 +964,7 @@ static void event_done(struct work *work)
>  	if (ret)
>  		panic("failed to register event fd");
>  
> -	if (!list_empty(&sys->event_queue))
> -		process_request_event_queues();
> +	process_request_event_queues();
>  }
>  
>  int is_access_to_busy_objects(uint64_t oid)


Applied. Thanks

Yuan



More information about the sheepdog mailing list