[Sheepdog] handling network disconnections?

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Sep 9 12:00:40 CEST 2010


At Tue, 07 Sep 2010 16:30:12 +0200,
Tomasz Chmielewski wrote:
> 
> How does sheepdog handle network disconnections?
> 
> 
> Assuming I have a virtual machine with data copies on two sheepdog nodes 

Note that the virtual machine connects to only one server on Sheepdog
cluster, and the server works as a gateway.  The gateway is localhost
if you use sheepdog as README says.  The vm doesn't know which node is
a sheepdog member.

> - what would happen if:
> 
> 1) virtual machine is unable to connect to one sheepdog node for 1 
> minute; after that, connectivity is back,
> 

The vm can go on working if the reachable node is a gateway of the vm.
In this case, sheepdog runs on only one node until the connectivity of
the other node is back.  If the gateway is the unreachable one, the
connection between vm and gateway would be closed, and the vm cannot
access to sheepdog volumes.  After the connectivity is back, a
sheepdog block driver needs to reconnect to the gateway to use the
volumes again, but unfortunately, we don't implement it yet.

In the case the gateway node is localhost and becomes unreachable
because of maintenance, we can avoid the problem by migrating the vm
before maintenance (though migrating with sheepdog volumes is not
supported now).

> 
> 2) virtual machine is unable to connect to all sheepdog nodes for 1 
> minute; after that, connectivity is back,
>

The vm doesn't reconnect to the gateway, so the vm cannot access
sheepdog volumes even if connectivity is back...

> 
> 3) we have connectivity problems like above, but longer (say 5 minutes, 
> 15 minutes, 1 hour, 2 hours).
> 

Nothing happens to the vm if its gateway is reachable though sheepdog
runs on only one node in longer time.

If the gateway is unreachable one, the result is same as 1) and 2);
the vm cannot use sheepdog volumes and we need to reboot it.

> 
> Think as typical maintenance (replacing switches, cabling) or operator 
> error.
> 
> 
> What would happen to the virtual machine? Would it just wait for IO and 
> continue to work as connectivity is back? Or, perhaps, would it see its 
> disk is gone (lots of IO errors in dmesg etc.)?
> 

If the vm cannot access to the gateway, all the I/O requests to the
sheepdog volumes results in EIO, and probably lots of IO errors in
dmesg.


Thanks,

Kazutaka



More information about the sheepdog mailing list