[sheepdog-users] monitor cluster to avoid corruption

Sat Dec 15 11:44:46 CET 2012

On 12/14/2012 10:39 PM, Valerio Pachera wrote:
> *If it happens to write till the end of the cluster, the disk get corrupted*
>   collie vdi check test
>   Failed to read, No object found
> 
> I've been testing with only 1 vdi and 1 guest.
> If we have more disks, they might get corrupted as well.
> 
> Correct me if I'm wrong, but the only thing it can be done is to
> delete the vdi disk.
> 

This should be fixed by the QEMU patch I mentioned.

> To monitor when the cluster is getting full, we have
>   collie node info
> 
> It's pretty easy if we have nodes all with the same amount of space,
> we just have to look at the 'Total' percentage or any of the disk.
> It gets more difficult when we have different node sizes.
> 
> Here is an example, after I've been writing 512M (formated with 2 copies)
> ---
> collie  node info
> Id      Size    Used    Use%
>  0      982 MB  196 MB   19%
>  1      982 MB  160 MB   16%
>  2      982 MB  204 MB   20%
>  3      10.0 GB 528 MB    5%
> Total   13 GB   1.1 GB    8%
> Total virtual image size        10 GB
> ---
> 
> And here is the same cluster after I've been writing data till filling
> up all the available space.
> ---
> fino in fondo
> collie  node info
> Id      Size    Used    Use%
>  0      982 MB  980 MB   99%
>  1      982 MB  796 MB   81%
>  2      982 MB  952 MB   96%
>  3      10.0 GB 2.5 GB   25%
> Total   13 GB   5.2 GB   40%
> Total virtual image size        10 GB
> ---
> 
> *Obviously we can't look at the 'Total' percentage to understand when
> the cluster is getting full.*

With nodes of different size, sheep just try its best to balance the
data over all nodes. Sheepdog internally use a hash function to store
objects on to nodes and this is kind of hash collision problem. We have
make use of virtual nodes to mitigate this problem and it works well
with multiple images(The more the better). But for a single VM, I think
it fails its goal. Well, single VM usage isn't practical.

Please try to test a more practical case, for e.g, to run dozens of VMs.
If data aren't balanced well, we need to fix sheep then.

> Think of a different scenario with several different node sizes (1T,
> 500G, 2T, 750G....).
> I bet you to find out the total amount of available space and, more
> important, the total free space (percentage) of the cluster.
> 
> May it be possible to print the 'Total relative available space ' and
> the respective percentage?
> The actual 'Total' is just the sum of the devices.
> If not, may you please tell me how to calculate it?
> 
> *Is it going to be possible to avoid disk corruption?*
> 

I think in this case, admins should parse percentage of every node and
set the greatest one as out of space indicator.

Thanks,
Yuan