[Sheepdog] [PATCH] sheep: handle CPG_EVENT_REQUEST even if CPG_EVENT_CONCHG exists

Thu Sep 15 04:30:17 CEST 2011

At Wed, 14 Sep 2011 11:24:14 +0800,
zituan at taobao.com wrote:
> 
> From: Yibin Shen <zituan at taobao.com>
> 
> This patch prevents a CPG_EVENT_CONCHG event from blocking VM I/Os.
> 
> for more details, if a CPG_EVENT_CONCHG event occured inside the
> CPG_EVENT_DELIVER and CPG_EVENT_REQUEST event pair(for example:
> a vdi lookup oreration followed by a meta object read operation),
> then whole cluster will hang forever for the meta object operation
> be blocked. this patch delays a CPG_EVENT_CONCHG event handling.
> 
> Signed-off-by: Yibin Shen <zituan at taobao.com>
> ---
>  sheep/group.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/sheep/group.c b/sheep/group.c
> index eb0c4e2..b9dd9d7 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -1487,10 +1487,8 @@ do_retry:
>  	list_for_each_entry_safe(cevent, n, &sys->cpg_event_siblings, cpg_event_list) {
>  		struct request *req = container_of(cevent, struct request, cev);
>  
> -		if (cevent->ctype == CPG_EVENT_DELIVER)
> +		if (cevent->ctype == CPG_EVENT_DELIVER || cevent->ctype == CPG_EVENT_CONCHG)
>  			continue;
> -		if (cevent->ctype == CPG_EVENT_CONCHG)
> -			break;

The intention of this code is to flush all outstanding I/Os before
processing CPG_EVENT_CONCHG.  CPG_EVENT_CONCHG causes a epoch update,
and we want to avoid it while processing I/O requests to ensure a
strong data consistency.

The pended CPG_EVENT_CONCHG will be resumed after all outstanding I/Os
are finished, so I think this code isn't a problem.  If the event
isn't resumed properly, there should be a bug in another area.  Are
there steps to reproduce the hang-up?

Anyway, start_cpg_event_work() should be refactored to be more
readable, I think.

Thanks,

Kazutaka