this hang-up can be reproduced by following steps: 1) do intensive CPG_EVENT_DELIVER event operation, such as vdi lookup/add/del. you can run some sheepdog storage based VM simultaneously. 2) then stop some node's corosync sequentially On Thu, Sep 15, 2011 at 10:30 AM, MORITA Kazutaka < morita.kazutaka at lab.ntt.co.jp> wrote: > At Wed, 14 Sep 2011 11:24:14 +0800, > zituan at taobao.com wrote: > > > > From: Yibin Shen <zituan at taobao.com> > > > > This patch prevents a CPG_EVENT_CONCHG event from blocking VM I/Os. > > > > for more details, if a CPG_EVENT_CONCHG event occured inside the > > CPG_EVENT_DELIVER and CPG_EVENT_REQUEST event pair(for example: > > a vdi lookup oreration followed by a meta object read operation), > > then whole cluster will hang forever for the meta object operation > > be blocked. this patch delays a CPG_EVENT_CONCHG event handling. > > > > Signed-off-by: Yibin Shen <zituan at taobao.com> > > --- > > sheep/group.c | 4 +--- > > 1 files changed, 1 insertions(+), 3 deletions(-) > > > > diff --git a/sheep/group.c b/sheep/group.c > > index eb0c4e2..b9dd9d7 100644 > > --- a/sheep/group.c > > +++ b/sheep/group.c > > @@ -1487,10 +1487,8 @@ do_retry: > > list_for_each_entry_safe(cevent, n, &sys->cpg_event_siblings, > cpg_event_list) { > > struct request *req = container_of(cevent, struct request, > cev); > > > > - if (cevent->ctype == CPG_EVENT_DELIVER) > > + if (cevent->ctype == CPG_EVENT_DELIVER || cevent->ctype == > CPG_EVENT_CONCHG) > > continue; > > - if (cevent->ctype == CPG_EVENT_CONCHG) > > - break; > > The intention of this code is to flush all outstanding I/Os before > processing CPG_EVENT_CONCHG. CPG_EVENT_CONCHG causes a epoch update, > and we want to avoid it while processing I/O requests to ensure a > strong data consistency. > > The pended CPG_EVENT_CONCHG will be resumed after all outstanding I/Os > are finished, so I think this code isn't a problem. If the event > isn't resumed properly, there should be a bug in another area. Are > there steps to reproduce the hang-up? > > Anyway, start_cpg_event_work() should be refactored to be more > readable, I think. > > > Thanks, > > Kazutaka > -- > sheepdog mailing list > sheepdog at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110915/b2d0d3ea/attachment.html> |