At Wed, 14 Sep 2011 11:24:14 +0800, zituan at taobao.com wrote: > > From: Yibin Shen <zituan at taobao.com> > > This patch prevents a CPG_EVENT_CONCHG event from blocking VM I/Os. > > for more details, if a CPG_EVENT_CONCHG event occured inside the > CPG_EVENT_DELIVER and CPG_EVENT_REQUEST event pair(for example: > a vdi lookup oreration followed by a meta object read operation), > then whole cluster will hang forever for the meta object operation > be blocked. this patch delays a CPG_EVENT_CONCHG event handling. > > Signed-off-by: Yibin Shen <zituan at taobao.com> > --- > sheep/group.c | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > diff --git a/sheep/group.c b/sheep/group.c > index eb0c4e2..b9dd9d7 100644 > --- a/sheep/group.c > +++ b/sheep/group.c > @@ -1487,10 +1487,8 @@ do_retry: > list_for_each_entry_safe(cevent, n, &sys->cpg_event_siblings, cpg_event_list) { > struct request *req = container_of(cevent, struct request, cev); > > - if (cevent->ctype == CPG_EVENT_DELIVER) > + if (cevent->ctype == CPG_EVENT_DELIVER || cevent->ctype == CPG_EVENT_CONCHG) > continue; > - if (cevent->ctype == CPG_EVENT_CONCHG) > - break; The intention of this code is to flush all outstanding I/Os before processing CPG_EVENT_CONCHG. CPG_EVENT_CONCHG causes a epoch update, and we want to avoid it while processing I/O requests to ensure a strong data consistency. The pended CPG_EVENT_CONCHG will be resumed after all outstanding I/Os are finished, so I think this code isn't a problem. If the event isn't resumed properly, there should be a bug in another area. Are there steps to reproduce the hang-up? Anyway, start_cpg_event_work() should be refactored to be more readable, I think. Thanks, Kazutaka |