[Sheepdog] failed node and space reclaiming

Sun Jul 24 04:33:29 CEST 2011

Hi All,

Testing sheepdog.

create cluster for 3node with --copies=2
created vdi with 2 gb. start VM write date to vdi
dd if=/dev/zero of=/dev/vda bs=1M count=2000

collie node info
Id      Size    Used    Use%
 0      20 GB   740 MB    3%
 1      20 GB   636 MB    3%
 2      4.8 GB  632 MB   12%
Total   44 GB   2.0 GB    4%, total virtual VDI Size    2.0 GB

so far so good.

now simulate node failure ( on node3 by killling sheep )
after recovery complited:
collie node info
Id      Size    Used    Use%
 0      20 GB   2.0 GB   10%
 1      19 GB   1.9 GB    9%
Total   39 GB   3.8 GB    9%, total virtual VDI Size    2.0 GB

start sheep on 3 node again - wait for recovery, after finished:
collie node info
Id      Size    Used    Use%
 0      19 GB   1.4 GB    7%
 1      19 GB   1.2 GB    6%
 2      3.5 GB  1.2 GB   34%
Total   41 GB   3.9 GB    9%, total virtual VDI Size    2.0 GB

one more failure on the same node and bring it back again:
collie node info
Id      Size    Used    Use%
 0      18 GB   1.4 GB    7%
 1      18 GB   1.2 GB    6%
 2      2.3 GB  1.2 GB   53%

Total   39 GB   3.9 GB   10%, total virtual VDI Size    2.0 GB

now the node number 3 using 3x times more space than it should and node 1
and 2 2x times.

Also seems that after failure it is trying ot copy all data over the other
node. It could be a good idea just to copy changed data. ( like drbd )
Is it any way to keep data in sync and delete the old data?

-- 
--
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110724/3f775906/attachment.html>