[Sheepdog] handling network disconnections?

Thu Sep 9 12:27:11 CEST 2010

On 09.09.2010 12:00, MORITA Kazutaka wrote:

> If the vm cannot access to the gateway, all the I/O requests to the
> sheepdog volumes results in EIO, and probably lots of IO errors in
> dmesg.

Is it "fixable"?

For example, with iSCSI, I can set the initiator to try to reconnect to 
the target for quite long (default is 2 minutes for open-iscsi, but we 
can set it to hours, or even days).

This way, the whole is much more failure-resilient for expected and 
unexpected connectivity problems:

- set long timeouts for iSCSI on the initiator,

- connectivity between the target and the initiator is interrupted (i.e. 
5 minute maintenance to replace cabling and switches turned out to be a 
2 hour one, as additional problems were identified),

- on the guest, all processes which wanted to read (and data was not 
cached/buffered already) or write, will be in "uninterruptible sleep", 
but other than that, the guest system is working correctly,

- when connectivity is back, initiator will reconnect, guest will resume 
to read/write and will function correctly.

Of course it meant that the guest was not usable for these 2 hours when 
target could not connect with the initiator; on the other hand, guest 
restart was not needed, no filesystem corruption happened.

Is it achievable (long term perhaps) with Sheepdog?

-- 
Tomasz Chmielewski
http://wpkg.org