[sheepdog-users] stability and architecture

Wed Jan 14 17:16:38 CET 2015

> As for monitoring, how can I get the cluster "health", something linke
"ceph health" or "ceph -w" outputs?
I check zookeeper status with crontab command:

ps -Fp $(cat /var/run/sheep.pid) | grep -Eo 'zookeeper\:\S+'|grep -Eo
'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]+' | sed 's/:/ /'|
xargs -I {} sh -c "echo ruok | nc {} | grep -q imok || echo 'Zookeeper node
{} failed'"


and nodes status for CentOS 7 nodes:

cat nodeslist.txt|xargs -I {} ssh {} "systemctl status sheepdog
>/dev/null|| echo Sheepdog node \$(hostname) fail"

2015-01-14 19:58 GMT+10:00 Corin Langosch <info at corinlangosch.com>:

> Hi guys,
>
> I'm thinking about switchting from ceph to sheepdog for backend storage of
> my vms. My main reasons are erasure coding
> support and local caching. I setup a test cluster (9.1 compiled from
> sources with zookeeper) of 6 machines, each having
> 1TB of sdd storage for sheepdog. Would you generally recommand using it in
> a production system with several hundrets of vms?
>
> I noticed when killing the local sheep process (gateway) it kills all
> attached qemus. Is this desired behavior or is
> just "proper" connection handling (ex. reconnect on failure) not
> implemented yet? Should rolling upgrades of sheepdog
> without downtime/ restart of vms be possible?
>
> As for monitoring, how can I get the cluster "health", something linke
> "ceph health" or "ceph -w" outputs? So basically
> which nodes are down and- how many (which) objects need recovery? I guess
> the first one could not be output as there's
> not such concept of "registered" nodes as ceph has?
>
> Does sheepdog provide periodic data integrity checking (scrubbing/ deep
> scrubbing) or is it planned? Is data integrity
> checked when objects are assembled from their individual chunks?
>
> If I understand correctly sheepdog is distributing an image to any number
> of nodes. So if I have 1000 nodes parts of the
> image can be on up to 1000 nodes (not very likely but you get the point).
> Now if I have 4:2 redundancy only 2 of these
> 1000 nodes are allowed to fail so that the image is still accessible. Is
> there any way to limit the number of nodes used
> for an image? For example limit an image to 100 nodes, to reduce the
> possiblilty of failure/ downtime?
>
> Afaik sheepdog uses the consistend hashing algorithm to choose which node
> should contains which parts of an image.
> Inside the node the same is done for the individual disks (in case of
> multiple disks per sheep setup). I wonder how
> sheepdog handles different disk sizes here? What happens if a disk runs
> full (because of non-equal distribution of the
> data for example) - is sheepdog able put the data on a different disk/node
> or will the image be unavailable for
> write-access?
>
> Is there somewhere a detailed description on how sheepdog ensures not old
> data is read in case of node failure, network
> partitioning etc.? So basically a technical descriptions on what sheepdog
> does when nodes enter/leave the cluster.
>
> Thank you a lot in advance.
>
> Corin
> --
> sheepdog-users mailing lists
> sheepdog-users at lists.wpkg.org
> https://lists.wpkg.org/mailman/listinfo/sheepdog-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150115/821e4731/attachment-0005.html>