[sheepdog] [PATCH v3 0/8] make IO requests to wait in recovery instead of busy retrying

levin li levin108 at gmail.com
Thu May 24 05:37:43 CEST 2012


From: levin li <xingke.lwp at taobao.com>

v2 --> v3:

1. ported list_splice_tail_init() from linux kernel for clear_wait_obj_requests()
2. move process_event_request_queue() out of the loops

There're many cases that the request being retried, which not wait
but directly put the request again into the request queue to make
it run again, it may cause CPU too busy.

In our cluster with 960 nodes, when 10 nodes leave and the there're
heavy IO request in VM, the recovery doesn't run well because there're
too many pending requests in the request queue which are retrying IO
requests, and it makes CPU too busy to process the recovery requests.

And also, there's race condition in recovery which keeps nodes retrying
to recovery a single object and make the recovery work hang there.

We should not make the request retry at the same time it fails, but we
should put it into a queue to make it sleep until the epoch or other
needs of this request are met, then we wake it up to make it retry.

There're 4 cases that a request needs to wait for retrying:

	1. epoch of request sender is older than system epoch

	   In this case, we response the sender with SD_RES_OLD_NODE_VER to
	   make gateway to retry, then gateway would put the request into
	   wait_rw_queue to wait its system epoch get changed.

	2. epoch of request sender is newer than system epoch

	   In this case, we put the request into wait_rw_queue, to wait its
	   system epoch to get changed, then to retry this request locally.

	3. object requested doesn't exist and recovery work is at RW_INIT state
	
	   In this case, we make is_recoverying_oid() check whether the object
	   requested exists, if so, process the request, if not, then put the
	   request into wait_rw_queue to wait for recovery work starts.

	4. object requested doesn't exist and is pending for recovery.
	   
	   In this case, we put the request into wait_obj_queue, and every time
	   we recovered an object we try to wake up a request in wait_obj_queue
	   which requesting the object just recovered.


levin li (8):
  sheep: port list_splice_tail_init() from linux kernel
  sheep: make requests with new epoch sleep until epoch is updated
  sheep: make gateway to retry when received SD_RES_OLD_NODE_VER
  recovery: make IO request to wait when recovery is in RW_INIT
  recovery: make IO request to wait when the requested object is in
    recovery
  recovery: clear the object wait queue when new recovery work comes
  recovery: fix a race condition in recovery
  sheep: make gateway requests only retry in io_op_done()

 include/list.h           |    9 +++++
 include/sheepdog_proto.h |    1 +
 sheep/group.c            |    2 ++
 sheep/recovery.c         |   50 ++++++++++++++++++++++----
 sheep/sdnet.c            |   88 +++++++++++++++++++++++++++++++++++++---------
 sheep/sheep_priv.h       |    5 +++
 6 files changed, 133 insertions(+), 22 deletions(-)

-- 
1.7.10




More information about the sheepdog mailing list