At the moment, I think that an IO operation from a failed disk will make the corresponding sheep call leave_cluster(), dropping into a gateway mode where it forwards IO operations for the qemu processes attached to it, but doesn't store data any more, and presumably isn't considered part of the cluster for the purposes of the consistent hash ring? I wonder if it would make sense to be able to directly start a sheep daemon in this state, i.e. a gateway daemon which doesn't have an associated store directory. Nodes in sheepdog clusters will probably have multiple drives, and the natural thing to do with these is to run one sheep daemon per drive. (Running a single daemon on top of a RAID array is wasteful, as sheepdog does its own data replication.) For example, I've been testing on machines with 6*2TB drives each, running as sheep -p 700[0-5] -D against the filesystems. If the first disk dies, sheep -p 7000 leaves the cluster but continues forwarding for the local qemu processes. However, when I replace the disk, I can't kill and restart it on the new, clean filesystem because all the VMs will lose their block storage. However, if I could start a pure gateway sheep, I could run that on port 7000, and use 700[1-6] for data storage sheep, all of which are safe to kill and restart. Conversely, the gateway sheep doesn't have associated storage, so doesn't need to be restarted. This would also enable non-storage nodes to have resilient qemu processes running on them, connecting to a local gateway sheep which forwards to the storage nodes in the ring. This is a (presumably easier and mostly already working) alternative to implementing sheepdog failover support in qemu. Does this make sense? Cheers, Chris. |