[sheepdog] [PATCH v5] sheep/recovery: multi-threading recovery process
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Feb 7 07:26:53 CET 2014
At Thu, 6 Feb 2014 17:18:57 +0800,
Liu Yuan wrote:
>
> Rationale for multi-threaded recovery:
>
> 1. If one node is added, we find that all the VMs on other nodes will get
> noticeably affected until 50% data is transferred to the new node.
>
> 2. For node failure, we might not have problems of running VM but the
> recovery process boost will benefit IO operation of VM with less
> chances to be blocked for write and also improve reliability.
>
> 3. For disk failure in node, this is similar to adding a node. All
> the data on the broken disk will be recovered on other disks in
> this node. Speedy recoery not only improve data reliability but
> also cause less writing blocking on the lost data.
>
> Our oid scheduling algorithm is intact and simply add multi-threading onto top
> of current recovery algorithm with minimal changes.
>
> - we still have ->oids array to denote oids to be recovered
> - we start up 2 * nr_disks threads for recovery
> - the tricky part is that we need to wait all the running threads to
> completion before start next recovery events for multiple nodes/disks events
>
> This patch passes "./check -g md -md" on my local box
>
> Signed-off-by: Liu Yuan <namei.unix at gmail.com>
> ---
> v5:
> - remove running_threads_nr
> v4:
> - fix lfind() in oid_in_recovery() to find if oid in the recovery list correctly
> to pass tests/func/010
> - add comment for run_next_rw() for why we check running_threads_nr > 1.
>
> sheep/md.c | 9 ++++--
> sheep/recovery.c | 91 ++++++++++++++++++++++++++++++++++++++++--------------
> sheep/sheep.c | 2 +-
> sheep/sheep_priv.h | 1 +
> 4 files changed, 77 insertions(+), 26 deletions(-)
Applied, thanks.
Kazutaka
More information about the sheepdog
mailing list