[Sheepdog] failed node and space reclaiming
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Sun Jul 24 09:54:31 CEST 2011
At Sun, 24 Jul 2011 14:33:29 +1200,
Michael wrote:
> Hi All,
>
> Testing sheepdog.
>
> create cluster for 3node with --copies=2
> created vdi with 2 gb. start VM write date to vdi
> dd if=/dev/zero of=/dev/vda bs=1M count=2000
>
> collie node info
> Id Size Used Use%
> 0 20 GB 740 MB 3%
> 1 20 GB 636 MB 3%
> 2 4.8 GB 632 MB 12%
> Total 44 GB 2.0 GB 4%, total virtual VDI Size 2.0 GB
>
> so far so good.
You specified '--copies=2' to the format option, so the total used
data size should be 2 x 2000 MB (= 3.9 GB), shouldn't it? Probably,
the VM had not synced all the data yet at this time?
>
> now simulate node failure ( on node3 by killling sheep )
> after recovery complited:
> collie node info
> Id Size Used Use%
> 0 20 GB 2.0 GB 10%
> 1 19 GB 1.9 GB 9%
> Total 39 GB 3.8 GB 9%, total virtual VDI Size 2.0 GB
>
> start sheep on 3 node again - wait for recovery, after finished:
> collie node info
> Id Size Used Use%
> 0 19 GB 1.4 GB 7%
> 1 19 GB 1.2 GB 6%
> 2 3.5 GB 1.2 GB 34%
> Total 41 GB 3.9 GB 9%, total virtual VDI Size 2.0 GB
>
> one more failure on the same node and bring it back again:
> collie node info
> Id Size Used Use%
> 0 18 GB 1.4 GB 7%
> 1 18 GB 1.2 GB 6%
> 2 2.3 GB 1.2 GB 53%
>
> Total 39 GB 3.9 GB 10%, total virtual VDI Size 2.0 GB
>
> now the node number 3 using 3x times more space than it should and node 1
> and 2 2x times.
I think this shows the correct used size.
>
> Also seems that after failure it is trying ot copy all data over the other
> node. It could be a good idea just to copy changed data. ( like drbd )
Yes. It is nice to support a differential copy for the fast object
recovery. I've added it to our TODO list:
https://github.com/collie/sheepdog/issues/24
Thanks,
Kazutaka
More information about the sheepdog
mailing list