[sheepdog] [PATCH 0/8] make IO requests to wait in recovery instead of busy retrying

Tue May 22 12:36:30 CEST 2012

On 05/22/2012 03:10 PM, Liu Yuan wrote:
> On 05/22/2012 10:51 AM, levin li wrote:
>
>> There're many cases that the request being retried, which not wait
>> but directly put the request again into the request queue to make
>> it run again, it may cause CPU too busy.
>>
>> In our cluster with 960 nodes, when 10 nodes leave and the there're
>> heavy IO request in VM, the recovery doesn't run well because there're
>> too many pending requests in the request queue which are retrying IO
>> requests, and it makes CPU too busy to process the recovery requests.
>>
>> And also, there's race condition in recovery which keeps nodes retrying
>> to recovery a single object and make the recovery work hang there.
>>
>> We should not make the request retry at the same time it fails, but we
>> should put it into a queue to make it sleep until the epoch or other
>> needs of this request are met, then we wake it up to make it retry.
>>
>
>
> For a skim over the patch set, I think this patch set need a rethink of
> list usage. It takes several lists to implement 'wait queue' conception,
> on both requester and requestee nodes, which looks unnecessary to me and
> code doesn't comment on why choose more than one queue. What makes them
> different each other?
>
> It is really much better if we can unify those queues into only one
> queue, which we rely on to queue requests, resume requests, flush requests.
>
> Thanks,
> Yuan

Using one list is simple and easy to implement, it's also what I did 
firstly, but there's a problem, when in heave IO conditions, there could 
be many pending requests either by epoch inconsistency or object in 
recovery, the list may gets very large, it's inefficient to traverse a 
large list to find which request can be waked up during recovery.

In my implementation, after recovering one object, sheep trys to wake up 
a pending request which request this specified object, the process needs 
traversing the list, if all in one list, it's really time consuming and 
inefficient.

thanks,

levin