<div dir="ltr">><span style="font-size:14px"> As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? </span><div>I check zookeeper status with crontab command:</div><div>
<p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">ps -Fp
$(cat /var/run/sheep.pid) | grep -Eo 'zookeeper\:\S+'|grep -Eo
'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]+' | sed 's/:/ /'| xargs
-I {} sh -c "echo ruok | nc {} | grep -q imok || echo 'Zookeeper node {}
failed'"</p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US"><br></p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">and nodes status for CentOS 7 nodes:</p><p style="margin:0in;font-family:Consolas;font-size:11pt" lang="en-US">cat nodeslist.txt|xargs -I {} ssh {} "systemctl status sheepdog >/dev/null|| echo Sheepdog node \$(hostname) fail"<br></p>
</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-01-14 19:58 GMT+10:00 Corin Langosch <span dir="ltr"><<a href="mailto:info@corinlangosch.com" target="_blank">info@corinlangosch.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi guys,<br>
<br>
I'm thinking about switchting from ceph to sheepdog for backend storage of my vms. My main reasons are erasure coding<br>
support and local caching. I setup a test cluster (9.1 compiled from sources with zookeeper) of 6 machines, each having<br>
1TB of sdd storage for sheepdog. Would you generally recommand using it in a production system with several hundrets of vms?<br>
<br>
I noticed when killing the local sheep process (gateway) it kills all attached qemus. Is this desired behavior or is<br>
just "proper" connection handling (ex. reconnect on failure) not implemented yet? Should rolling upgrades of sheepdog<br>
without downtime/ restart of vms be possible?<br>
<br>
As for monitoring, how can I get the cluster "health", something linke "ceph health" or "ceph -w" outputs? So basically<br>
which nodes are down and- how many (which) objects need recovery? I guess the first one could not be output as there's<br>
not such concept of "registered" nodes as ceph has?<br>
<br>
Does sheepdog provide periodic data integrity checking (scrubbing/ deep scrubbing) or is it planned? Is data integrity<br>
checked when objects are assembled from their individual chunks?<br>
<br>
If I understand correctly sheepdog is distributing an image to any number of nodes. So if I have 1000 nodes parts of the<br>
image can be on up to 1000 nodes (not very likely but you get the point). Now if I have 4:2 redundancy only 2 of these<br>
1000 nodes are allowed to fail so that the image is still accessible. Is there any way to limit the number of nodes used<br>
for an image? For example limit an image to 100 nodes, to reduce the possiblilty of failure/ downtime?<br>
<br>
Afaik sheepdog uses the consistend hashing algorithm to choose which node should contains which parts of an image.<br>
Inside the node the same is done for the individual disks (in case of multiple disks per sheep setup). I wonder how<br>
sheepdog handles different disk sizes here? What happens if a disk runs full (because of non-equal distribution of the<br>
data for example) - is sheepdog able put the data on a different disk/node or will the image be unavailable for<br>
write-access?<br>
<br>
Is there somewhere a detailed description on how sheepdog ensures not old data is read in case of node failure, network<br>
partitioning etc.? So basically a technical descriptions on what sheepdog does when nodes enter/leave the cluster.<br>
<br>
Thank you a lot in advance.<br>
<span class="HOEnZb"><font color="#888888"><br>
Corin<br>
--<br>
sheepdog-users mailing lists<br>
<a href="mailto:sheepdog-users@lists.wpkg.org">sheepdog-users@lists.wpkg.org</a><br>
<a href="https://lists.wpkg.org/mailman/listinfo/sheepdog-users" target="_blank">https://lists.wpkg.org/mailman/listinfo/sheepdog-users</a><br>
</font></span></blockquote></div><br></div>