[sheepdog] [PATCH 0/8] make IO requests to wait in recovery instead of busy retrying

levin li levin108 at gmail.com
Tue May 22 12:36:30 CEST 2012


On 05/22/2012 03:10 PM, Liu Yuan wrote:
> On 05/22/2012 10:51 AM, levin li wrote:
>
>> There're many cases that the request being retried, which not wait
>> but directly put the request again into the request queue to make
>> it run again, it may cause CPU too busy.
>>
>> In our cluster with 960 nodes, when 10 nodes leave and the there're
>> heavy IO request in VM, the recovery doesn't run well because there're
>> too many pending requests in the request queue which are retrying IO
>> requests, and it makes CPU too busy to process the recovery requests.
>>
>> And also, there's race condition in recovery which keeps nodes retrying
>> to recovery a single object and make the recovery work hang there.
>>
>> We should not make the request retry at the same time it fails, but we
>> should put it into a queue to make it sleep until the epoch or other
>> needs of this request are met, then we wake it up to make it retry.
>>
>
>
> For a skim over the patch set, I think this patch set need a rethink of
> list usage. It takes several lists to implement 'wait queue' conception,
> on both requester and requestee nodes, which looks unnecessary to me and
> code doesn't comment on why choose more than one queue. What makes them
> different each other?
>
> It is really much better if we can unify those queues into only one
> queue, which we rely on to queue requests, resume requests, flush requests.
>
> Thanks,
> Yuan

Using one list is simple and easy to implement, it's also what I did 
firstly, but there's a problem, when in heave IO conditions, there could 
be many pending requests either by epoch inconsistency or object in 
recovery, the list may gets very large, it's inefficient to traverse a 
large list to find which request can be waked up during recovery.

In my implementation, after recovering one object, sheep trys to wake up 
a pending request which request this specified object, the process needs 
traversing the list, if all in one list, it's really time consuming and 
inefficient.

thanks,

levin



More information about the sheepdog mailing list