[Sheepdog] [PATCH] sheep: handle CPG_EVENT_REQUEST even if CPG_EVENT_CONCHG exists

Yibin Shen kkbaal at gmail.com
Thu Sep 15 05:15:38 CEST 2011


this hang-up can be reproduced by following steps:
1) do intensive CPG_EVENT_DELIVER event operation, such as vdi
lookup/add/del.
   you can run some sheepdog storage based VM  simultaneously.
2) then stop some node's corosync sequentially

On Thu, Sep 15, 2011 at 10:30 AM, MORITA Kazutaka <
morita.kazutaka at lab.ntt.co.jp> wrote:

> At Wed, 14 Sep 2011 11:24:14 +0800,
> zituan at taobao.com wrote:
> >
> > From: Yibin Shen <zituan at taobao.com>
> >
> > This patch prevents a CPG_EVENT_CONCHG event from blocking VM I/Os.
> >
> > for more details, if a CPG_EVENT_CONCHG event occured inside the
> > CPG_EVENT_DELIVER and CPG_EVENT_REQUEST event pair(for example:
> > a vdi lookup oreration followed by a meta object read operation),
> > then whole cluster will hang forever for the meta object operation
> > be blocked. this patch delays a CPG_EVENT_CONCHG event handling.
> >
> > Signed-off-by: Yibin Shen <zituan at taobao.com>
> > ---
> >  sheep/group.c |    4 +---
> >  1 files changed, 1 insertions(+), 3 deletions(-)
> >
> > diff --git a/sheep/group.c b/sheep/group.c
> > index eb0c4e2..b9dd9d7 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1487,10 +1487,8 @@ do_retry:
> >       list_for_each_entry_safe(cevent, n, &sys->cpg_event_siblings,
> cpg_event_list) {
> >               struct request *req = container_of(cevent, struct request,
> cev);
> >
> > -             if (cevent->ctype == CPG_EVENT_DELIVER)
> > +             if (cevent->ctype == CPG_EVENT_DELIVER || cevent->ctype ==
> CPG_EVENT_CONCHG)
> >                       continue;
> > -             if (cevent->ctype == CPG_EVENT_CONCHG)
> > -                     break;
>
> The intention of this code is to flush all outstanding I/Os before
> processing CPG_EVENT_CONCHG.  CPG_EVENT_CONCHG causes a epoch update,
> and we want to avoid it while processing I/O requests to ensure a
> strong data consistency.
>
> The pended CPG_EVENT_CONCHG will be resumed after all outstanding I/Os
> are finished, so I think this code isn't a problem.  If the event
> isn't resumed properly, there should be a bug in another area.  Are
> there steps to reproduce the hang-up?
>
> Anyway, start_cpg_event_work() should be refactored to be more
> readable, I think.
>
>
> Thanks,
>
> Kazutaka
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110915/b2d0d3ea/attachment-0003.html>


More information about the sheepdog mailing list