[sheepdog-users] Several difficulties with sheepdog (from 0.4.0-0+tek2b-10 deb package)

Thu Jul 26 18:53:31 CEST 2012

> Am 2012-07-26 16:39, schrieb icez network:
> > sheepdog v0.4.0 change the storage backend from 'simple' to 'farm' so 
> > that
> > you required some manual task from 'good practices' (especially 
> > converting
> > the storage backend from simple to farm).
> 
> I guess thats not the problem (definitely not in my environment ;-)
> 
> I am using farm as storage, but if I understand David correct, wants
> to update only the debian package (both sheepdog 0.4, but with
> different startup scripts) so he already had farm storage...
> 
> The problem I think is, that he kills the sheeps manually instead
> of a cluster shutdown. In this case it seems to me, that the sheeps
> invalidate their data. If he kills all sheep at the same time, than
> no sheep with valid data remains....
> 
> A similar problem happens to me, if I had a complete network error
> and my three nodes cant see each other. I didnt find a way to
> recover from this situation...
> 
> Cheers
> 
> Bastian
> 
I had several cases of data lost too while testing sheepdog 0.4.0. They have all to do with killing a sheep instance. The last init script in deb pkg -10/-11 trys to cover this problem. If you kill a sheep instance then all its data is to be declared invalid. So you have a problem if no instance have a valid copy. Only collie cluster shutdown shuts down alle instances cleanly.

In a case of a crash, like your network error, you have a problem if one node dosn't have a full copy. So 3 nodes must have 3 copies. Or use redundant network links, so situation can't happen. For me some times collie cluster recover and collie cluster cleanup works after killing/crash.

Next step is to write best-practice-guide how to setup a sheepdog cluster in the right way. All help is welcome.

Cheers Jens