[Sheepdog] [PATCH 1/2] sheep: sheep: handle node change event first

MORITA Kazutaka morita.kazutaka at gmail.com
Sun Apr 1 07:59:49 CEST 2012


At Sun, 01 Apr 2012 13:24:23 +0800,
Liu Yuan wrote:
> 
> On 04/01/2012 12:16 PM, MORITA Kazutaka wrote:
> 
> > At Sun, 01 Apr 2012 11:58:00 +0800,
> > Liu Yuan wrote:
> >>
> >> On 04/01/2012 11:41 AM, MORITA Kazutaka wrote:
> >>
> >>> At Sat, 31 Mar 2012 18:31:00 +0800,
> >>> Liu Yuan wrote:
> >>>>
> >>>> On 03/31/2012 06:23 PM, MORITA Kazutaka wrote:
> >>>>
> >>>>> Many bad effects.  For example, imagine that join messages are
> >>>>> processed in the different order with other nodes.
> >>>>
> >>>>
> >>>> Maybe not. I notice that every call to start_cpg_event_work() will drain
> >>>> the cpg queue, So this change will assure us that confchg will be
> >>>> handled for sure, despite of other requests.
> >>>
> >>> No, membership change events are blocked until all outstanding I/O
> >>> requests are flushed or the previous change membership event are
> >>> finished.  There exists the case that the cpg queue is not empty after
> >>> start_cpg_event_work() was called.
> >>>
> >>>>
> >>>> We both do DD in guests and do a loop for creating new vid and deleting
> >>>> that vdi during the join/leave test.
> >>>>
> >>>> All seems good so far... look the sequence for joining 60 nodes
> >>>
> >>> This is a timing problem.  I think the problem would happen on other
> >>> environments.
> >>>
> >>> Let's take another approach.  Here is my suggestion:
> >>>
> >>>  - Use different queues for I/O requests and membership events.
> >>>  - When membership queue is empty, we can process I/O requests as
> >>>    usual.
> >>>  - When membership queue is not empty, flush all outstanding I/Os.
> >>>    New I/O requests are blocked until the membership queue becomes
> >>>    empty.
> >>>  - SD_OP_SHUTDOWN and SD_OP_MAKE_FS should be pushed to the membership
> >>>    queue, and other operations are pushed to the I/O request queue.
> >>>
> >>
> >>
> >> I considered split queues. Initially, I planned to solve it that way.
> >> But after analysis, I don't think it is necessary.
> >>
> >> I think we need to firstly handle membership change before flushing IO
> >> requests, because IO request doesn't know the routing, we need to feed
> >> them with freshest membership.
> >>
> >> The timing problem doesn't exist at all. The cluster driver would assure
> >> us the order of events, and I just recorder notify & confchg with IO
> >> events. The internal order of notify and confchg is maintained, this two
> >> patch and the whole cpg working mechanism will allow both notify &
> >> confchg to be working as 'once it happens , it is handled immediately'.
> > 
> > Consider the following simple case:
> > 
> >  1. There are two nodes, A and B.
> >  2. Only node A has the outstanding I/O.
> >  3. New nodes, C and D, join to Sheepdog.
> >  4. Node B processes two join messeages, "join C" and "join D".
> >  5. Node A doesn't processes the join messages until the outstanding
> >     I/O is finished.  If you push the join messages to the head of the
> >     cpg queue, the messages are processed in the reverse order ("join
> >     D" first) because start_cpg_event_work process the event from the
> >     head of the queue.
> > 
> 
> 
> No, this doesn't happen.
> 
> With my patch, the step 5 is:
>  5. Node A process 'join C' first, then 'join D', because
> 	1) cluster driver broadcasts the join message in this order
> 	2) start_cpg_event_work() will strictly serialize the sequence that
> guarantee processing 'join C' before 'join D'.

'join C' is processed after there is no outstanding I/Os.

https://github.com/collie/sheepdog/blob/master/sheep/group.c#L1181

So start_cpg_event_work() cannot guarantee that sheep processes 'join
C' before 'join D' if sheep adds 'join D' event to the head of
cpg_event_siblings, which includes 'join C' event.


> That is, you can insert D
> before C until the C is processed.

If 'join C' is blocked due to outstanding I/Os and sd_join_handler()
of 'join D' is called before processing 'join C', 'join D' is pushed
to the head of cpg_event_siblings which includes 'join C', and
start_cpg_event_work() will process 'join D' first.


I guess we aren't talking about the same thing...


Thanks,

Kazutaka



More information about the sheepdog mailing list