[Sheepdog] [PATCH 1/2] sheep: sheep: handle node change event first
Liu Yuan
namei.unix at gmail.com
Sun Apr 1 07:24:23 CEST 2012
On 04/01/2012 12:16 PM, MORITA Kazutaka wrote:
> At Sun, 01 Apr 2012 11:58:00 +0800,
> Liu Yuan wrote:
>>
>> On 04/01/2012 11:41 AM, MORITA Kazutaka wrote:
>>
>>> At Sat, 31 Mar 2012 18:31:00 +0800,
>>> Liu Yuan wrote:
>>>>
>>>> On 03/31/2012 06:23 PM, MORITA Kazutaka wrote:
>>>>
>>>>> Many bad effects. For example, imagine that join messages are
>>>>> processed in the different order with other nodes.
>>>>
>>>>
>>>> Maybe not. I notice that every call to start_cpg_event_work() will drain
>>>> the cpg queue, So this change will assure us that confchg will be
>>>> handled for sure, despite of other requests.
>>>
>>> No, membership change events are blocked until all outstanding I/O
>>> requests are flushed or the previous change membership event are
>>> finished. There exists the case that the cpg queue is not empty after
>>> start_cpg_event_work() was called.
>>>
>>>>
>>>> We both do DD in guests and do a loop for creating new vid and deleting
>>>> that vdi during the join/leave test.
>>>>
>>>> All seems good so far... look the sequence for joining 60 nodes
>>>
>>> This is a timing problem. I think the problem would happen on other
>>> environments.
>>>
>>> Let's take another approach. Here is my suggestion:
>>>
>>> - Use different queues for I/O requests and membership events.
>>> - When membership queue is empty, we can process I/O requests as
>>> usual.
>>> - When membership queue is not empty, flush all outstanding I/Os.
>>> New I/O requests are blocked until the membership queue becomes
>>> empty.
>>> - SD_OP_SHUTDOWN and SD_OP_MAKE_FS should be pushed to the membership
>>> queue, and other operations are pushed to the I/O request queue.
>>>
>>
>>
>> I considered split queues. Initially, I planned to solve it that way.
>> But after analysis, I don't think it is necessary.
>>
>> I think we need to firstly handle membership change before flushing IO
>> requests, because IO request doesn't know the routing, we need to feed
>> them with freshest membership.
>>
>> The timing problem doesn't exist at all. The cluster driver would assure
>> us the order of events, and I just recorder notify & confchg with IO
>> events. The internal order of notify and confchg is maintained, this two
>> patch and the whole cpg working mechanism will allow both notify &
>> confchg to be working as 'once it happens , it is handled immediately'.
>
> Consider the following simple case:
>
> 1. There are two nodes, A and B.
> 2. Only node A has the outstanding I/O.
> 3. New nodes, C and D, join to Sheepdog.
> 4. Node B processes two join messeages, "join C" and "join D".
> 5. Node A doesn't processes the join messages until the outstanding
> I/O is finished. If you push the join messages to the head of the
> cpg queue, the messages are processed in the reverse order ("join
> D" first) because start_cpg_event_work process the event from the
> head of the queue.
>
No, this doesn't happen.
With my patch, the step 5 is:
5. Node A process 'join C' first, then 'join D', because
1) cluster driver broadcasts the join message in this order
2) start_cpg_event_work() will strictly serialize the sequence that
guarantee processing 'join C' before 'join D'. That is, you can insert D
before C until the C is processed.
Thanks,
Yuan
More information about the sheepdog
mailing list