[Sheepdog] [PATCH] collie: show better error message while node membership is changing
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Mon Dec 12 11:04:11 CET 2011
At Sun, 11 Dec 2011 17:57:54 +0000,
Chris Webb wrote:
>
> MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes:
>
> > It looks a bit difficult to handle collie commands gracefully during
> > node membership changes. I think of showing an error message to force
> > users to retry the commands, and leaving this problem as a future work.
>
> Hi Kazutaka. If we defined an extra 'temporary failure; please retry' exit
> code for collie, automated systems would be able to detect this case and
> automatically wait and retry themselves too. It's probably fine to do
> something like that and rely on the layer that's calling collie to retry if
> that's easier to implement.
>
> Just a thought, but what happens to qemu VMs accessing sheepdog block
> devices when this happens?>
In that case, the gateway sheep daemon, which is localhost by default,
will retry I/O requests automatically.
> Presumably they do hang, and then restart once the node membership
> is sorted?>
Yes, until new node membership is established, qemu I/Os will be
blocked. The time is depends on your corosync.conf and TCP connect
timeout.
> But (also presumably), new qemu VMs who try to start during the
> change will fail?
Yes. But after reading your mail, I guess it might be better to retry
collie's I/O requests in the gateway like qemu's ones. It needs a
slight change, but will support automatic retry simply. I'll send the
patch soon to check how it works.
Thanks,
Kazutaka
> It would be nice to know that this has
> happened for a temporary reason too, but that might be harder to propagate
> out of qemu.
More information about the sheepdog
mailing list