[Sheepdog] Segmentation faults and cluster failure

Fri Sep 23 04:13:21 CEST 2011

On 09/23/2011 03:51 AM, Shawn Moore wrote:
> I did go ahead this morning and pull down g066d753 and apply it to the
> four nodes.  I brought up node174 first like usual, and then tried
> node173, but it still refused to join, but with a different message:
>
> .......
> Sep 22 08:28:49 update_cluster_info(756) status = 2, epoch = 17, 66, 0
> Sep 22 08:28:49 update_cluster_info(759) failed to join sheepdog, 66
> Sep 22 08:28:49 leave_cluster(1984) 16
> Sep 22 08:28:49 update_cluster_info(761) I am really hurt and gonna
> leave cluster.
> Sep 22 08:28:49 update_cluster_info(762) Fix yourself and restart me
> later, pleaseeeee...Bye.
> Sep 22 08:28:49 log_sigsegv(367) sheep logger exits abnormally, pid:24265
>
>
> I then brought up node157 and it recovered.  Then I brought up node156
> and it recovered as well.  Then I was able to bring up node173.
>
> I noticed some odd things (mostly related to sizes of in use
> changing), looking at "collie node info" during the node startups.
> I'm not sure if this is normal or not, but below is what I saw.  The
> timing is from top (oldest) to bottom (most current).  Do objects
> re-distribute themselves around the cluster during recovery or epoch
> changes?
>
> node174 and node157:
> [root at node174 ~]# collie node info
> Id	Size	Used	Use%
>   0	382 GB	17 GB	  4%
>   1	394 GB	17 GB	  4%
>
> Total	775 GB	34 GB	  4%, total virtual VDI Size	100 GB
>
> Then added node156:
> [root at node174 ~]# collie node info
> Id	Size	Used	Use%
>   0	365 GB	720 MB	  0%
>   1	376 GB	12 GB	  3%
>   2	380 GB	3.6 GB	  0%
> failed to read object, 80f5969500000000 Remote node has a new epoch
> failed to read a inode header
> failed to read object, 80f5969600000000 Remote node has a new epoch
> failed to read a inode header
>
> Total	1.1 TB	16 GB	  1%, total virtual VDI Size	0.0 MB
>
> [root at node174 ~]# collie node info
> Id	Size	Used	Use%
>   0	365 GB	1008 MB	  0%
>   1	377 GB	12 GB	  3%
>   2	382 GB	5.2 GB	  1%
>
> Total	1.1 TB	19 GB	  1%, total virtual VDI Size	100 GB
>
> Then after everyone is done with recovery, added node173 back:
> [root at node174 ~]# collie node info
> Id	Size	Used	Use%
>   0	374 GB	10 GB	  2%
>   1	377 GB	12 GB	  3%
>   2	399 GB	22 GB	  5%
>
> Total	1.1 TB	45 GB	  3%, total virtual VDI Size	100 GB
>
> [root at node174 ~]# collie node info
> Id	Size	Used	Use%
>   0	365 GB	496 MB	  0%
>   1	366 GB	1.3 GB	  0%
>   2	377 GB	792 MB	  0%
>   3	394 GB	17 GB	  4%
> failed to read object, 80f5969400000000 Remote node has a new epoch
> failed to read a inode header
>
> Total	1.5 TB	20 GB	  1%, total virtual VDI Size	100 GB
>
> [root at node174 sheepdog]# collie node info
> Id	Size	Used	Use%
>   0	386 GB	21 GB	  5%
>   1	381 GB	17 GB	  4%
>   2	397 GB	21 GB	  5%
>   3	394 GB	17 GB	  4%
>
> Total	1.5 TB	76 GB	  4%, total virtual VDI Size	100 GB
>
> But as far as I can tell, everything is working right now.
Hi Shawn,
     Thanks for your testing. Currently, for crash cluster (different 
epoch with each nodes), we can only recovery the cluster if we obey some 
start-up order. Basically, there are two orders that you would follow:

1 start-up the node which is crashed firstly. then start up others 
nodes(which would exit with 'bye' message) until the first one starts to 
recovery. from now on, you can safely join others in the cluster.

2 start-up the node which has the highest epoch version first. then 
start one other node. now the first node is supposed to recovery. from 
now one, you can join others into the cluster.

either one is okay to recover the whole cluster into functional state if 
there is no object loss. For now, I think method 2 would allow you 
recover the cluster safely. But Kazutaka is cooking a patch to address 
object loss in the first method.

BTW, would you please try 'collie cluster info' to check if the outputs 
are consistent on each node.

Thanks,
Yuan