[Sheepdog] qemu-img convert slowness and high availability status
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Thu Jun 16 04:43:04 CEST 2011
At Wed, 15 Jun 2011 14:24:37 +0200,
krim son wrote:
> Ok I was able to reproduce, I believe this happens when booting a VM with a
> sheepdog volume fails (because the sheep daemon was down). Here's the
> output:
>
> node 1:
> # sheep -f /data/sheep/
> sheep: jrnl_recover(2221) Openning the directory
> /data/sheep//journal/00000009/.
> sheep: set_addr(1595) addr = 172.16.1.1, port = 7000
> sheep: main(144) Sheepdog daemon (version 0.2.3) started
> sheep: get_cluster_status(403) sheepdog is waiting with older epoch, 10 9
> 172.16.1.2:7000
>
> node 2:
> # sheep -f /data/sheep/
> sheep: jrnl_recover(2221) Openning the directory
> /data/sheep//journal/00000010/.
> sheep: set_addr(1595) addr = 172.16.1.2, port = 7000
> sheep: main(144) Sheepdog daemon (version 0.2.3) started
> sheep: send_join_request(1048) 33624236 22579
> sheep: update_cluster_info(568) failed to join sheepdog, 65
>
> # collie cluster info -a 172.16.1.1
> Waiting for other nodes joining
>
> Ctime Epoch Nodes
> 2011-06-15 11:50:16 9 [172.16.1.1:7000, 172.16.1.2:7000]
> # collie cluster info -a 172.16.1.2
> The node had failed to join sheepdog
>
> Ctime Epoch Nodes
> 2011-06-15 11:50:16 10 [172.16.1.2:7000]
You did something like the following, didn't you?
1. kill sheep on 172.16.1.1
2. kill sheep on 172.16.1.2
(Sheepdog stops completely)
3. start sheep on 172.16.1.1
4. start sheep on 172.16.1.2
After the sheep daemon on 172.16.1.1 was down, Sheepdog continued to
work with single node (172.16.1.2). So, to start Sheepdog again, you
need to start the sheepdog daemon on 172.16.1.2 first, and then, add
the daemon on 172.16.1.1. Of course, it is better for Sheepdog to fix
the node membership inconsistency automatically, but it is a bit
difficult.
If you want to stop sheepdog more safely, please run 'collie cluster
shutdown'.
Thanks,
Kazutaka
More information about the sheepdog
mailing list