[Sheepdog] Cluster doesn't come up correctly after reboot
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Mon Apr 19 04:36:09 CEST 2010
Hi,
Thanks for your report.
On 2010/04/19 3:47, Wido den Hollander wrote:
> Hi,
>
> My sheepdog cluster isn't online, so i gets rebooted a few times a week.
>
> I'm using the cluster for testing Ceph and Sheepdog, and this week i was
> playing more with Ceph then Sheepdog.
>
> Now i just checked my cluster and it seems that my nodes can't find
> eachother anymore.
>
Sorry, the latest sheepdog in the git tree cannot seem to handle node join
well.
>
> I double check, collie is running on all 5 nodes and the sheepdog
> directory is mounted on all 5.
>
> Please note, this cluster was running fine a few days ago, nothing
> changed in the mount points, corosync configuration or anything else
> regarding sheepdog.
>
> What i did notice is:
>
> root at osd1:~# shepherd info -t cluster
> there is inconsistency between epochs
>
> Ctime Epoch Nodes
> 10-04-15 17:24:00 4 [192.168.6.215:7000, 192.168.6.215:7000,
> 192.168.6.213:7000, 192.168.6.211:7000, 192.168.6.211:7000,
> 192.168.6.214:7000]
> root at osd1:~#
>
Let me clarify a few things. Did you run `shepherd shutdown' before
stopping collie processes? Do problems occur only when rebooting
sheepdog? Clean startup doesn't cause problems always?
> Creating a new image also fails..
>
> root at osd1:~# /usr/local/bin/qemu-img create -f sheepdog johndoe 10G
> Formatting 'johndoe', fmt=sheepdog size=10737418240
> do_sd_create 1143: Invalid error code, johndoe
> qemu-img: Error while formatting
> root at osd1:~#
>
> I got the cluster running again after clearing all the sheepdog
> directories and do a mkfs again, but this shouldn't happen, a cluster
> should survive several reboots, shouldn't it?
>
Yes, it should. We'll fix this problem as soon as possible.
> After rebooting my machines, the sheepdog cluster was unstable again.
> Same result, nodes couldn't find eachother.
>
Thanks,
Kazutaka Morita
More information about the sheepdog
mailing list