[Sheepdog] [PATCH 1/2] sheep: sheep: handle node change event first
MORITA Kazutaka
morita.kazutaka at gmail.com
Sun Apr 1 06:16:26 CEST 2012
At Sun, 01 Apr 2012 11:58:00 +0800,
Liu Yuan wrote:
>
> On 04/01/2012 11:41 AM, MORITA Kazutaka wrote:
>
> > At Sat, 31 Mar 2012 18:31:00 +0800,
> > Liu Yuan wrote:
> >>
> >> On 03/31/2012 06:23 PM, MORITA Kazutaka wrote:
> >>
> >>> Many bad effects. For example, imagine that join messages are
> >>> processed in the different order with other nodes.
> >>
> >>
> >> Maybe not. I notice that every call to start_cpg_event_work() will drain
> >> the cpg queue, So this change will assure us that confchg will be
> >> handled for sure, despite of other requests.
> >
> > No, membership change events are blocked until all outstanding I/O
> > requests are flushed or the previous change membership event are
> > finished. There exists the case that the cpg queue is not empty after
> > start_cpg_event_work() was called.
> >
> >>
> >> We both do DD in guests and do a loop for creating new vid and deleting
> >> that vdi during the join/leave test.
> >>
> >> All seems good so far... look the sequence for joining 60 nodes
> >
> > This is a timing problem. I think the problem would happen on other
> > environments.
> >
> > Let's take another approach. Here is my suggestion:
> >
> > - Use different queues for I/O requests and membership events.
> > - When membership queue is empty, we can process I/O requests as
> > usual.
> > - When membership queue is not empty, flush all outstanding I/Os.
> > New I/O requests are blocked until the membership queue becomes
> > empty.
> > - SD_OP_SHUTDOWN and SD_OP_MAKE_FS should be pushed to the membership
> > queue, and other operations are pushed to the I/O request queue.
> >
>
>
> I considered split queues. Initially, I planned to solve it that way.
> But after analysis, I don't think it is necessary.
>
> I think we need to firstly handle membership change before flushing IO
> requests, because IO request doesn't know the routing, we need to feed
> them with freshest membership.
>
> The timing problem doesn't exist at all. The cluster driver would assure
> us the order of events, and I just recorder notify & confchg with IO
> events. The internal order of notify and confchg is maintained, this two
> patch and the whole cpg working mechanism will allow both notify &
> confchg to be working as 'once it happens , it is handled immediately'.
Consider the following simple case:
1. There are two nodes, A and B.
2. Only node A has the outstanding I/O.
3. New nodes, C and D, join to Sheepdog.
4. Node B processes two join messeages, "join C" and "join D".
5. Node A doesn't processes the join messages until the outstanding
I/O is finished. If you push the join messages to the head of the
cpg queue, the messages are processed in the reverse order ("join
D" first) because start_cpg_event_work process the event from the
head of the queue.
Thanks,
Kazutaka
More information about the sheepdog
mailing list