<div dir="ltr">><span style="font-size:14px"> As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? </span><div>I check zookeeper status with crontab command:</div><div>


<p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">ps -Fp

$(cat /var/run/sheep.pid) | grep -Eo 'zookeeper\:\S+'|grep -Eo

'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]+' | sed 's/:/ /'| xargs

-I {} sh -c "echo ruok | nc {} | grep -q imok || echo 'Zookeeper node {}

failed'"</p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US"><br></p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">and nodes status for CentOS 7 nodes:</p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">cat nodeslist.txt|xargs -I {} ssh {} "systemctl status sheepdog >/dev/null|| echo Sheepdog node \$(hostname) fail"<br></p>


</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-01-14 19:58 GMT+10:00 Corin Langosch <span dir="ltr"><<a href="mailto:info@corinlangosch.com" target="_blank">info@corinlangosch.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi guys,<br>

<br>

I'm thinking about switchting from ceph to sheepdog for backend storage of my vms. My main reasons are erasure coding<br>

support and local caching. I setup a test cluster (9.1 compiled from sources with zookeeper) of 6 machines, each having<br>

1TB of sdd storage for sheepdog. Would you generally recommand using it in a production system with several hundrets of vms?<br>

<br>

I noticed when killing the local sheep process (gateway) it kills all attached qemus. Is this desired behavior or is<br>

just "proper" connection handling (ex. reconnect on failure) not implemented yet? Should rolling upgrades of sheepdog<br>

without downtime/ restart of vms be possible?<br>

<br>

As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? So basically<br>

which nodes are down and- how many (which) objects need recovery? I guess the first one could not be output as there's<br>

not such concept of "registered" nodes as ceph has?<br>

<br>

Does sheepdog provide periodic data integrity checking (scrubbing/ deep scrubbing) or is it planned? Is data integrity<br>

checked when objects are assembled from their individual chunks?<br>

<br>

If I understand correctly sheepdog is distributing an image to any number of nodes. So if I have 1000 nodes parts of the<br>

image can be on up to 1000 nodes (not very likely but you get the point). Now if I have 4:2 redundancy only 2 of these<br>

1000 nodes are allowed to fail so that the image is still accessible. Is there any way to limit the number of nodes used<br>

for an image? For example limit an image to 100 nodes, to reduce the possiblilty of failure/ downtime?<br>

<br>

Afaik sheepdog uses the consistend hashing algorithm to choose which node should contains which parts of an image.<br>

Inside the node the same is done for the individual disks (in case of multiple disks per sheep setup). I wonder how<br>

sheepdog handles different disk sizes here? What happens if a disk runs full (because of non-equal distribution of the<br>

data for example) - is sheepdog able put the data on a different disk/node or will the image be unavailable for<br>

write-access?<br>

<br>

Is there somewhere a detailed description on how sheepdog ensures not old data is read in case of node failure, network<br>

partitioning etc.? So basically a technical descriptions on what sheepdog does when nodes enter/leave the cluster.<br>

<br>

Thank you a lot in advance.<br>

<span class="HOEnZb"><font color="#888888"><br>

Corin<br>

--<br>

sheepdog-users mailing lists<br>

<a href="mailto:sheepdog-users@lists.wpkg.org">sheepdog-users@lists.wpkg.org</a><br>

<a href="https://lists.wpkg.org/mailman/listinfo/sheepdog-users" target="_blank">https://lists.wpkg.org/mailman/listinfo/sheepdog-users</a><br>

</font></span></blockquote></div><br></div>