[sheepdog] Question about fix_object_consistency()

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Wed Jun 20 05:57:55 CEST 2012


At Tue, 19 Jun 2012 16:09:46 +0800,
Liu Yuan wrote:
> 
> On 06/03/2012 02:47 PM, MORITA Kazutaka wrote:
> > With the following scenarios, object replicas could have the different
> > contents:
> > 
> >  - a gateway node fails while forwarding write requests
> >  - total node failure happens while writing objects
> > 
> > In the such cases, it is okay for VMs not to read the latest data from
> > the inconsistent objects because the VMs received EIO from them
> > before.  However, it is still needed to fix the objects' inconsistency
> > so that the VMs won't read the different data from the objects next
> > time.
> 
> If VM get a EIO of this object, this means the operation is failed we
> should revert the partial operation to all the replica. But seems this

If a VM gets EIO, it doesn't make any assumption about the data
written, so we don't necessarily need to revert the operation; it's
also okay to complete the partial operation.  What we need to do is
just to fix consistency between replica.

> is very hard to achieve, so fix_object_consistency() does a minimal fix:
> just blindly assure the consistency between replica. This might make do
> for now. But the fix itself will cause problem:
> 
> un 08 14:32:42 queue_request(387) 1
> Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
> Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
> Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48551
> Jun 08 14:32:42 queue_request(387) 1
> Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
> Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
> Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48552
> Jun 08 14:32:42 queue_request(387) 1
> Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
> Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
> Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48549
> Jun 08 14:32:42 queue_request(387) 1
> Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
> Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48550
> Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
> Jun 08 14:32:42 queue_request(387) 2
> Jun 08 14:32:42 do_io_request(105) 2, dc4435000011be , 2
> Jun 08 14:32:42 do_local_io(52) 2, dc4435000011be , 2
> Jun 08 14:32:42 do_io_request(111) failed: 2, dc4435000011be , 2, 3
> Jun 08 14:32:42 io_op_done(119) leaving sheepdog cluster
> Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48551
> 
> fix_object_consistency() might be called in multiple threads and cause
> trouble.
> 
> So I am going to remove it completely, but doesn't come up with any
> efficient means to overcome partial replica writes by failed node. But
> current fix_object_consistency() already looks wrong enough to be
> removed and I think we shouldn't include this in June release. How do
> you think of it, Kazum?

Okay, let's remove it for now.

How about adding an offline vdi check/repair operation like fsck to
collie to fix object inconsistency?  I guess it would be enough for
actual use cases.

Thanks,

Kazutaka



More information about the sheepdog mailing list