[sheepdog-users] stability and architecture

Mon Feb 2 07:25:10 CET 2015

On Wed, Jan 14, 2015 at 10:58:12AM +0100, Corin Langosch wrote:
> Hi guys,
> 
> I'm thinking about switchting from ceph to sheepdog for backend storage of my vms. My main reasons are erasure coding
> support and local caching. I setup a test cluster (9.1 compiled from sources with zookeeper) of 6 machines, each having
> 1TB of sdd storage for sheepdog. Would you generally recommand using it in a production system with several hundrets of vms?
> 
> I noticed when killing the local sheep process (gateway) it kills all attached qemus. Is this desired behavior or is
> just "proper" connection handling (ex. reconnect on failure) not implemented yet? Should rolling upgrades of sheepdog
> without downtime/ restart of vms be possible?

Auto-reconnect is supported in newer QEMU via sheepdog block driver. You seem
test the old QEMU.

> 
> As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? So basically
> which nodes are down and- how many (which) objects need recovery? I guess the first one could not be output as there's
> not such concept of "registered" nodes as ceph has?

Online health check was discussed in the past but no further actions to make it
happen.

> Does sheepdog provide periodic data integrity checking (scrubbing/ deep scrubbing) or is it planned? Is data integrity
> checked when objects are assembled from their individual chunks?
> 
> If I understand correctly sheepdog is distributing an image to any number of nodes. So if I have 1000 nodes parts of the
> image can be on up to 1000 nodes (not very likely but you get the point). Now if I have 4:2 redundancy only 2 of these
> 1000 nodes are allowed to fail so that the image is still accessible. Is there any way to limit the number of nodes used
> for an image? For example limit an image to 100 nodes, to reduce the possiblilty of failure/ downtime?

Why not set up more than one sheepdog cluster so that you can put the desired
number of nodes in a single cluster?

> Afaik sheepdog uses the consistend hashing algorithm to choose which node should contains which parts of an image.
> Inside the node the same is done for the individual disks (in case of multiple disks per sheep setup). I wonder how
> sheepdog handles different disk sizes here? What happens if a disk runs full (because of non-equal distribution of the
> data for example) - is sheepdog able put the data on a different disk/node or will the image be unavailable for
> write-access?

We try our best to balance the disks. If one of your disks is full, either it is
a bug or a warning that you need more nodes to add in.

> Is there somewhere a detailed description on how sheepdog ensures not old data is read in case of node failure, network
> partitioning etc.? So basically a technical descriptions on what sheepdog does when nodes enter/leave the cluster.
> 
> Thank you a lot in advance.

Umm, I hope someday we have it, the technical white paper.

Thanks
Yuan