[sheepdog-users] is --nohalt dangerous?
Dietmar Maurer
dietmar at proxmox.com
Wed Jul 18 10:05:09 CEST 2012
> -----Original Message-----
> From: Liu Yuan [mailto:namei.unix at gmail.com]
> Sent: Mittwoch, 18. Juli 2012 10:01
> To: Dietmar Maurer
> Cc: sheepdog-users at lists.wpkg.org
> Subject: Re: [sheepdog-users] is --nohalt dangerous?
>
> On 07/18/2012 03:53 PM, Dietmar Maurer wrote:
> > OK, so maybe the 2 node is a special case. What about:
> >
> > if ((sys->nr_copies > 2) &&
> > (current_vnode_info->nr_zones <= (sys->nr_copies/2)))
> > sys_stat_set(SD_STATUS_HALT);
>
> Sheepdog provide strong consistency for objects, so I don't think we need
> quorum based algorithm. Even with one copies left, the cluster is still running
> well with it. So I don't yet see the use case for this quorum calculation.
This avoid the bug pointed out in commit 9b6102ce?
> =======================================
> sheep: introduce SD_STATUS_HALT
>
> Currently, sheepdog will serve IO requests even if number of nodes
> is less than 'copies'.
>
> When the number of the nodes (or zones) is less than the copies
> specified by collie-cluster-format command, the sheepdog cluster
> should stop serving IO requests.
>
> This is necessary to solve the below subtle case:
>
> + good nodes, - failed nodes.
>
> 0 1 2 3
> + - - +
> + --> - --> - --> +
> + + - # <-- permanently down.
> ^
> |
> this node has the latest data
>
> at stage 3, we will have a cluster recovered without the data
> tracked at stage 1.
>
> When the nodes are in the SD_STATUS_HALT, the sheepdog can also
> serve configuration change and do the recovery job.
> ==========================
More information about the sheepdog-users
mailing list