[sheepdog-users] cluster data distribution
Hitoshi Mitake
mitake.hitoshi at gmail.com
Thu May 9 18:00:06 CEST 2013
At Thu, 9 May 2013 17:04:26 +0200,
Valerio Pachera wrote:
>
> 2013/5/9 Liu Yuan <namei.unix at gmail.com>:
> > Since we don't change weight by plugging/unplugging disks, there is no
> > way to rebalance data. If we allow weight-change for plug/unplug, we
> > have to pay price: plug/unplug one disk will trigger the whole cluster
> > recovery.
>
> Looking at it from the other side:
> my node uses two disks: 2T + 500G.
> What happens then if I unplug the 2T disk now? It contains lot's of data.
> Data can't be distributed across the node disk.
> It's has to trigger a cluster recover.
> When I plug the disk backup, this node is not going to be used much.
>
> root at sheepdog001:~# collie node md info
> Id Size Use Path
> 0 1.2 TB 593 GB /mnt/ST2000DM001-1CH164_W1E2N5G6/obj
> 1 212 GB 253 GB /mnt/wd_WMAYP0904279
>
> I do not like the idea of having more sheeps on a single node.
> I think it's dangerous because n sheeps on a node may be bigger than n copies.
> I the whole host dies, more nodes are going to die.
This problem can be solved by giving the sheeps same zone ID (with -z
option). sheeps which share same zone ID are not used for storing
identical objects for redundancy. But,
> That's why I like md approach.
md is a smarter way for managing multiple disks on a single physical
host. multiple sheeps on a single physical host should be avoided.
>
> I think a recovery/rebalance is needed.
> Maybe not automatically, but by a collie command, so we can choose to
> trigger the cluster when it's less loaded.
The problem comes from the implementation of current sheepdog. Current
sheepdog rebalances only when join/leave occurs. This should be more
flexible. I have an ongoing changes for doing rebalancing without
join/leave. It would be applicable for your problem. I'll send the
patches after implementing it.
Thanks,
Hitoshi
More information about the sheepdog-users
mailing list