[Sheepdog] Segmentation faults and cluster failure

Liu Yuan namei.unix at gmail.com
Tue Sep 20 15:14:02 CEST 2011

On 09/20/2011 08:29 PM, Shawn Moore wrote:
>> So I guess you have shutdowned the cluster by 'collie cluster shutdown'
>> command, no?
> I did not use the shutdown command because I was attempting to
> simulate what would happen if an entire zone went down.  For us a zone
> would be a datacenter (physically separated).
> I did go ahead and attempt to issue it now, but I get:
> [root at node174 ~]# collie cluster shutdown
> Waiting for other nodes joining
>> would you please attach the log from the nodes that wouldnot
>> join?
> You can find the logs from the four nodes here:
> http://www.stormpoint.com/files/sheepdog_logs.tgz
Hi Shawn,

Thanks for your log. It is helpful. I have root-caused the problem 
(epoch version mismatch during recovery), but unfortunately there is no 
easy patch yet. Well, I am going to cook a patch exactly handing this 
problem soon.


