[sheepdog-users] Several difficulties with sheepdog (from 0.4.0-0+tek2b-10 deb package)

Thu Jul 26 20:55:02 CEST 2012

> So, what I had expected is following scenario:
> 
> If I loose more zones than number of copies at the same time
> -> Shit happens! It will be be very unlikely under normal
> conditions.
> 
> But when I loose half or even more sheeps at the same time, I
> think it should be possible to fail to a recoverable state...

yes, we need a solution for that case. until that don't use sheepdog for production environment. if you a brave one then use it anyway.

> 
> > Next step is to write best-practice-guide how to setup a sheepdog
> > cluster in the right way. All help is welcome.
> 
> I am not very good while documentation something, but I try
> my best ;-)

you are ;-)

> 
> To answer Davids question for best practise update something
> like this should do it...
> 
> The update scenario depends if you need a running cluster the
> hole time, or if you can plan a complete shutdown for some time.
> 
> If you need to run the cluster all the time, you have to kill
> the sheeps on one node, make the update and restart the sheeps.
> After this, wait for recovery to complete and proceed with
> the next node. After finishing with all nodes run ''collie
> cluster cleanup'', this removes obj no longer needed on the
> nodes after successful recovery.
> 

yes I agree to this procedure. do you know how to monitor or display the recovery state? how you do it?

> If you have a timeframe to shutdown the cluster completely, it
> is maybe faster to use ''collie cluster shutdown'' (shut down
> all connected qemu instances before) to stop all sheeps on all
> nodes which leaves the cluster in a clean state.
> Then make the updates on all nodes an restart the sheeps, the
> cluster starts working again, if all original inhabitants are
> back alive on the farm.
> 

yes I agree. this is also the only way for migration from 0.3.0 simple store to 0.4.0 farm.

Thanks Jens

> Cheers
> 
> Bastian
>