MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes: > It looks a bit difficult to handle collie commands gracefully during > node membership changes. I think of showing an error message to force > users to retry the commands, and leaving this problem as a future work. Hi Kazutaka. If we defined an extra 'temporary failure; please retry' exit code for collie, automated systems would be able to detect this case and automatically wait and retry themselves too. It's probably fine to do something like that and rely on the layer that's calling collie to retry if that's easier to implement. Just a thought, but what happens to qemu VMs accessing sheepdog block devices when this happens? Presumably they do hang, and then restart once the node membership is sorted? But (also presumably), new qemu VMs who try to start during the change will fail? It would be nice to know that this has happened for a temporary reason too, but that might be harder to propagate out of qemu. Cheers, Chris. |