[sheepdog-users] stability and architecture

Corin Langosch info at corinlangosch.com
Wed Jan 14 10:58:12 CET 2015


Hi guys,

I'm thinking about switchting from ceph to sheepdog for backend storage of my vms. My main reasons are erasure coding
support and local caching. I setup a test cluster (9.1 compiled from sources with zookeeper) of 6 machines, each having
1TB of sdd storage for sheepdog. Would you generally recommand using it in a production system with several hundrets of vms?

I noticed when killing the local sheep process (gateway) it kills all attached qemus. Is this desired behavior or is
just "proper" connection handling (ex. reconnect on failure) not implemented yet? Should rolling upgrades of sheepdog
without downtime/ restart of vms be possible?

As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? So basically
which nodes are down and- how many (which) objects need recovery? I guess the first one could not be output as there's
not such concept of "registered" nodes as ceph has?

Does sheepdog provide periodic data integrity checking (scrubbing/ deep scrubbing) or is it planned? Is data integrity
checked when objects are assembled from their individual chunks?

If I understand correctly sheepdog is distributing an image to any number of nodes. So if I have 1000 nodes parts of the
image can be on up to 1000 nodes (not very likely but you get the point). Now if I have 4:2 redundancy only 2 of these
1000 nodes are allowed to fail so that the image is still accessible. Is there any way to limit the number of nodes used
for an image? For example limit an image to 100 nodes, to reduce the possiblilty of failure/ downtime?

Afaik sheepdog uses the consistend hashing algorithm to choose which node should contains which parts of an image.
Inside the node the same is done for the individual disks (in case of multiple disks per sheep setup). I wonder how
sheepdog handles different disk sizes here? What happens if a disk runs full (because of non-equal distribution of the
data for example) - is sheepdog able put the data on a different disk/node or will the image be unavailable for
write-access?

Is there somewhere a detailed description on how sheepdog ensures not old data is read in case of node failure, network
partitioning etc.? So basically a technical descriptions on what sheepdog does when nodes enter/leave the cluster.

Thank you a lot in advance.

Corin



More information about the sheepdog-users mailing list