On 05/22/2012 03:10 PM, Liu Yuan wrote: > On 05/22/2012 10:51 AM, levin li wrote: > >> There're many cases that the request being retried, which not wait >> but directly put the request again into the request queue to make >> it run again, it may cause CPU too busy. >> >> In our cluster with 960 nodes, when 10 nodes leave and the there're >> heavy IO request in VM, the recovery doesn't run well because there're >> too many pending requests in the request queue which are retrying IO >> requests, and it makes CPU too busy to process the recovery requests. >> >> And also, there's race condition in recovery which keeps nodes retrying >> to recovery a single object and make the recovery work hang there. >> >> We should not make the request retry at the same time it fails, but we >> should put it into a queue to make it sleep until the epoch or other >> needs of this request are met, then we wake it up to make it retry. >> > > > For a skim over the patch set, I think this patch set need a rethink of > list usage. It takes several lists to implement 'wait queue' conception, > on both requester and requestee nodes, which looks unnecessary to me and > code doesn't comment on why choose more than one queue. What makes them > different each other? > > It is really much better if we can unify those queues into only one > queue, which we rely on to queue requests, resume requests, flush requests. > > Thanks, > Yuan Using one list is simple and easy to implement, it's also what I did firstly, but there's a problem, when in heave IO conditions, there could be many pending requests either by epoch inconsistency or object in recovery, the list may gets very large, it's inefficient to traverse a large list to find which request can be waked up during recovery. In my implementation, after recovering one object, sheep trys to wake up a pending request which request this specified object, the process needs traversing the list, if all in one list, it's really time consuming and inefficient. thanks, levin |