[Sheepdog] handling network disconnections?

Thu Sep 9 16:02:13 CEST 2010

At Thu, 09 Sep 2010 12:27:11 +0200,
Tomasz Chmielewski wrote:
> 
> On 09.09.2010 12:00, MORITA Kazutaka wrote:
> 
> > If the vm cannot access to the gateway, all the I/O requests to the
> > sheepdog volumes results in EIO, and probably lots of IO errors in
> > dmesg.
> 
> Is it "fixable"?
> 
> For example, with iSCSI, I can set the initiator to try to reconnect to 
> the target for quite long (default is 2 minutes for open-iscsi, but we 
> can set it to hours, or even days).
> 
> This way, the whole is much more failure-resilient for expected and 
> unexpected connectivity problems:
> 
> - set long timeouts for iSCSI on the initiator,
> 
> - connectivity between the target and the initiator is interrupted (i.e. 
> 5 minute maintenance to replace cabling and switches turned out to be a 
> 2 hour one, as additional problems were identified),
> 
> - on the guest, all processes which wanted to read (and data was not 
> cached/buffered already) or write, will be in "uninterruptible sleep", 
> but other than that, the guest system is working correctly,
> 
> - when connectivity is back, initiator will reconnect, guest will resume 
> to read/write and will function correctly.
> 
> 
> Of course it meant that the guest was not usable for these 2 hours when 
> target could not connect with the initiator; on the other hand, guest 
> restart was not needed, no filesystem corruption happened.
> 
> 
> Is it achievable (long term perhaps) with Sheepdog?
> 

No.  Sheepdog volumes can be seen as a normal ide (or scsi) disks, and
if sheepdog sleeps for a long time, timeout would occur in the guest
OS.  I think other qemu block drivers result in the same result too.
The only way to maintain sheepdog cluster safely seems to be migrating
the virtual machine to the other safe host machines, I think.

Thanks,

Kazutaka