[sheepdog] [PATCH 08/11] Doc. "Sheepdog Basic" add chapter "fail over"

Tue Oct 22 07:59:37 CEST 2013

On Sun, Oct 20, 2013 at 10:41:01AM +0200, Valerio Pachera wrote:
> Signed-off-by: Valerio Pachera <sirio81 at gmail.com>
> ---
>  doc/fail_over.rst |   36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
>  create mode 100644 doc/fail_over.rst
> 
> diff --git a/doc/fail_over.rst b/doc/fail_over.rst
> new file mode 100644
> index 0000000..892d79d
> --- /dev/null
> +++ b/doc/fail_over.rst
> @@ -0,0 +1,36 @@
> +Fail Over
> +=========
> +
> +Now we are able to manage guests on our cluster and we want to check if it's
> +really able to survive a node loss.
> +Start a guest on any of the node.
> +Find the node ID you wish to fail by *'vdi list'*
> +(not the node where the guest is running, of course).
> +Then kill the node:
> +
> +::
> +
> +    # dog node kill 3
> +
> +Guest is still running without any problem and by 'dog node list' you'll see
> +that one node is missing.
> +
> +But how do we know if sheepdog is recovering the "lost" data?
> +
> +*(At this very moment, some objects have only 1 copy instead of 2.
> +The second copy has to be rebuild on the active nodes)*.
> +
> +::
> +
> +    # dog node recovery
> +    Nodes In Recovery:
> +    Id   Host:Port         V-Nodes       Zone
> +    0   192.168.2.41:7000      50  688040128
> +    1   192.168.2.42:7000      50  704817344
> +    2   192.168.2.43:7000      92  721594560
> +
> +Here you can see which nodes are receiving data.
> +Once done, the list will be empty.

dog node recovery output is changed too in the master branch

> +**IMPORTANT:**
> +do not remove other nodes from the cluster during recovery!

Actually it is okay to remove multiple nodes in the cluster to simulate
group failure, for which case sheepdog can handle gracefully.

Thanks
Yuan