[Sheepdog] On gateway sheep and running a sheepdog cluster

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Dec 15 10:22:06 CET 2011


At Tue, 13 Dec 2011 14:11:34 +0000,
Chris Webb wrote:
> 
> At the moment, I think that an IO operation from a failed disk will make the
> corresponding sheep call leave_cluster(), dropping into a gateway mode where
> it forwards IO operations for the qemu processes attached to it, but doesn't
> store data any more, and presumably isn't considered part of the cluster for
> the purposes of the consistent hash ring?

Yes, right.

> 
> I wonder if it would make sense to be able to directly start a sheep daemon
> in this state, i.e. a gateway daemon which doesn't have an associated store
> directory.
> 
> Nodes in sheepdog clusters will probably have multiple drives, and the
> natural thing to do with these is to run one sheep daemon per drive.
> (Running a single daemon on top of a RAID array is wasteful, as sheepdog
> does its own data replication.) For example, I've been testing on machines
> with 6*2TB drives each, running as sheep -p 700[0-5] -D against the
> filesystems.
> 
> If the first disk dies, sheep -p 7000 leaves the cluster but continues
> forwarding for the local qemu processes. However, when I replace the disk, I
> can't kill and restart it on the new, clean filesystem because all the VMs
> will lose their block storage.
> 
> However, if I could start a pure gateway sheep, I could run that on port
> 7000, and use 700[1-6] for data storage sheep, all of which are safe to kill
> and restart. Conversely, the gateway sheep doesn't have associated storage,
> so doesn't need to be restarted.
> 
> This would also enable non-storage nodes to have resilient qemu processes
> running on them, connecting to a local gateway sheep which forwards to the
> storage nodes in the ring. This is a (presumably easier and mostly already
> working) alternative to implementing sheepdog failover support in qemu.
> 
> Does this make sense?

It really makes sense, and would be a much better approach to remove
the gateway SPOF than implementing connection failover in the qemu
block driver!

I think it is not difficult to support a gateway mode in the sheep
command line.  I'll implement it after releasing 0.3.0. :)

Thanks,

Kazutaka



More information about the sheepdog mailing list