[sheepdog-users] is --nohalt dangerous?

Dietmar Maurer dietmar at proxmox.com
Wed Jul 18 10:05:09 CEST 2012



> -----Original Message-----
> From: Liu Yuan [mailto:namei.unix at gmail.com]
> Sent: Mittwoch, 18. Juli 2012 10:01
> To: Dietmar Maurer
> Cc: sheepdog-users at lists.wpkg.org
> Subject: Re: [sheepdog-users] is --nohalt dangerous?
> 
> On 07/18/2012 03:53 PM, Dietmar Maurer wrote:
> > OK, so maybe the 2 node is a special case. What about:
> >
> > 		if ((sys->nr_copies > 2) &&
> > 		    (current_vnode_info->nr_zones <= (sys->nr_copies/2)))
> > 			sys_stat_set(SD_STATUS_HALT);
> 
> Sheepdog provide strong consistency for objects, so I don't think we need
> quorum based algorithm. Even with one copies left, the cluster is still running
> well with it. So I don't yet see the use case for this quorum calculation.

This avoid the bug pointed out in commit 9b6102ce?
> =======================================
>     sheep: introduce SD_STATUS_HALT
> 
>     Currently, sheepdog will serve IO requests even if number of nodes 
> is less than 'copies'.
> 
>     When the number of the nodes (or zones) is less than the copies 
> specified by collie-cluster-format command, the sheepdog cluster 
> should stop serving IO requests.
> 
>     This is necessary to solve the below subtle case:
> 
>     + good nodes, - failed nodes.
> 
>     0       1      2     3
>     +       -      -     +
>     +  -->  - -->  - --> +
>     +       +      -     # <-- permanently down.
>             ^
>             |
>     this node has the latest data
> 
>     at stage 3, we will have a cluster recovered without the data 
> tracked at stage 1.
> 
>     When the nodes are in the SD_STATUS_HALT, the sheepdog can also 
> serve configuration change and do the recovery job.
> ==========================


More information about the sheepdog-users mailing list