> Am 2012-07-26 16:39, schrieb icez network: > > sheepdog v0.4.0 change the storage backend from 'simple' to 'farm' so > > that > > you required some manual task from 'good practices' (especially > > converting > > the storage backend from simple to farm). > > I guess thats not the problem (definitely not in my environment ;-) > > I am using farm as storage, but if I understand David correct, wants > to update only the debian package (both sheepdog 0.4, but with > different startup scripts) so he already had farm storage... > > The problem I think is, that he kills the sheeps manually instead > of a cluster shutdown. In this case it seems to me, that the sheeps > invalidate their data. If he kills all sheep at the same time, than > no sheep with valid data remains.... > > A similar problem happens to me, if I had a complete network error > and my three nodes cant see each other. I didnt find a way to > recover from this situation... > > Cheers > > Bastian > I had several cases of data lost too while testing sheepdog 0.4.0. They have all to do with killing a sheep instance. The last init script in deb pkg -10/-11 trys to cover this problem. If you kill a sheep instance then all its data is to be declared invalid. So you have a problem if no instance have a valid copy. Only collie cluster shutdown shuts down alle instances cleanly. In a case of a crash, like your network error, you have a problem if one node dosn't have a full copy. So 3 nodes must have 3 copies. Or use redundant network links, so situation can't happen. For me some times collie cluster recover and collie cluster cleanup works after killing/crash. Next step is to write best-practice-guide how to setup a sheepdog cluster in the right way. All help is welcome. Cheers Jens |