[sheepdog] [PATCH v2 0/7] make IO requests to wait in recovery instead of busy retrying

Liu Yuan namei.unix at gmail.com
Thu May 24 05:12:34 CEST 2012


On 05/23/2012 03:02 PM, levin li wrote:

> There're many cases that the request being retried, which not wait
> but directly put the request again into the request queue to make
> it run again, it may cause CPU too busy.
> 
> In our cluster with 960 nodes, when 10 nodes leave and the there're
> heavy IO request in VM, the recovery doesn't run well because there're
> too many pending requests in the request queue which are retrying IO
> requests, and it makes CPU too busy to process the recovery requests.
> 
> And also, there's race condition in recovery which keeps nodes retrying
> to recovery a single object and make the recovery work hang there.
> 
> We should not make the request retry at the same time it fails, but we
> should put it into a queue to make it sleep until the epoch or other
> needs of this request are met, then we wake it up to make it retry.
> 
> There're 4 cases that a request needs to wait for retrying:
> 
> 	1. epoch of request sender is older than system epoch
> 
> 	   In this case, we response the sender with SD_RES_OLD_NODE_VER to
> 	   make gateway to retry, then gateway would put the request into
> 	   wait_rw_queue to wait its system epoch get changed.
> 
> 	2. epoch of request sender is newer than system epoch
> 
> 	   In this case, we put the request into wait_rw_queue, to wait its
> 	   system epoch to get changed, then to retry this request locally.
> 
> 	3. object requested doesn't exist and recovery work is at RW_INIT state
> 	
> 	   In this case, we make is_recoverying_oid() check whether the object
> 	   requested exists, if so, process the request, if not, then put the
> 	   request into wait_rw_queue to wait for recovery work starts.
> 
> 	4. object requested doesn't exist and is pending for recovery.
> 	   
> 	   In this case, we put the request into wait_obj_queue, and every time
> 	   we recovered an object we try to wake up a request in wait_obj_queue
> 	   which requesting the object just recovered.


I have created a branch called 'live-lock-fix' to get a better review of
this patch set.

Generally speaking, this patch set tries to fix live lock(busy retrying)
on some of mutually influenced nodes in recovery phase, observed by our
simulated 960 nodes cluster.

Thanks,
Yuan



More information about the sheepdog mailing list