[sheepdog-users] is --nohalt dangerous?

Liu Yuan namei.unix at gmail.com
Wed Jul 18 08:08:45 CEST 2012


On 07/18/2012 01:45 PM, Dietmar Maurer wrote:
> I have a small cluster with only 3 nodes, and I want to store 3 copies:
> 
>  
> 
> # cluster format –copies 3
> 
>  
> 
> But as soon as one node dies the IO gets halted. To prevent that one can
> use:
> 
>  
> 
> # cluster format –copies 3 –nohalt
> 
>  
> 
> The question is why that is not the default behavior? Is that dangerous?
> If so, why?
> 
>  
> 

To quote from commit 9b6102ce:
=======================================
    sheep: introduce SD_STATUS_HALT

    Currently, sheepdog will serve IO requests even if number of nodes
is less than 'copies'.

    When the number of the nodes (or zones) is less than the copies
specified by collie-cluster-format command, the sheepdog cluster should
stop serving IO requests.

    This is necessary to solve the below subtle case:

    + good nodes, - failed nodes.

    0       1      2     3
    +       -      -     +
    +  -->  - -->  - --> +
    +       +      -     # <-- permanently down.
            ^
            |
    this node has the latest data

    at stage 3, we will have a cluster recovered without the data
tracked at stage 1.

    When the nodes are in the SD_STATUS_HALT, the sheepdog can also
serve configuration change and do the recovery job.
==========================



More information about the sheepdog-users mailing list