> So, what I had expected is following scenario: > > If I loose more zones than number of copies at the same time > -> Shit happens! It will be be very unlikely under normal > conditions. > > But when I loose half or even more sheeps at the same time, I > think it should be possible to fail to a recoverable state... yes, we need a solution for that case. until that don't use sheepdog for production environment. if you a brave one then use it anyway. > > > Next step is to write best-practice-guide how to setup a sheepdog > > cluster in the right way. All help is welcome. > > I am not very good while documentation something, but I try > my best ;-) you are ;-) > > To answer Davids question for best practise update something > like this should do it... > > The update scenario depends if you need a running cluster the > hole time, or if you can plan a complete shutdown for some time. > > If you need to run the cluster all the time, you have to kill > the sheeps on one node, make the update and restart the sheeps. > After this, wait for recovery to complete and proceed with > the next node. After finishing with all nodes run ''collie > cluster cleanup'', this removes obj no longer needed on the > nodes after successful recovery. > yes I agree to this procedure. do you know how to monitor or display the recovery state? how you do it? > If you have a timeframe to shutdown the cluster completely, it > is maybe faster to use ''collie cluster shutdown'' (shut down > all connected qemu instances before) to stop all sheeps on all > nodes which leaves the cluster in a clean state. > Then make the updates on all nodes an restart the sheeps, the > cluster starts working again, if all original inhabitants are > back alive on the farm. > yes I agree. this is also the only way for migration from 0.3.0 simple store to 0.4.0 farm. Thanks Jens > Cheers > > Bastian > |