[sheepdog] Question about fix_object_consistency()

Tue Jun 19 10:09:46 CEST 2012

On 06/03/2012 02:47 PM, MORITA Kazutaka wrote:
> With the following scenarios, object replicas could have the different
> contents:
> 
>  - a gateway node fails while forwarding write requests
>  - total node failure happens while writing objects
> 
> In the such cases, it is okay for VMs not to read the latest data from
> the inconsistent objects because the VMs received EIO from them
> before.  However, it is still needed to fix the objects' inconsistency
> so that the VMs won't read the different data from the objects next
> time.

If VM get a EIO of this object, this means the operation is failed we
should revert the partial operation to all the replica. But seems this
is very hard to achieve, so fix_object_consistency() does a minimal fix:
just blindly assure the consistency between replica. This might make do
for now. But the fix itself will cause problem:

un 08 14:32:42 queue_request(387) 1
Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48551
Jun 08 14:32:42 queue_request(387) 1
Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48552
Jun 08 14:32:42 queue_request(387) 1
Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48549
Jun 08 14:32:42 queue_request(387) 1
Jun 08 14:32:42 do_io_request(105) 1, dc4435000011be , 2
Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48550
Jun 08 14:32:42 do_local_io(52) 1, dc4435000011be , 2
Jun 08 14:32:42 queue_request(387) 2
Jun 08 14:32:42 do_io_request(105) 2, dc4435000011be , 2
Jun 08 14:32:42 do_local_io(52) 2, dc4435000011be , 2
Jun 08 14:32:42 do_io_request(111) failed: 2, dc4435000011be , 2, 3
Jun 08 14:32:42 io_op_done(119) leaving sheepdog cluster
Jun 08 14:32:42 client_rx_handler(588) connection from: 10.0.1.62:48551

fix_object_consistency() might be called in multiple threads and cause
trouble.

So I am going to remove it completely, but doesn't come up with any
efficient means to overcome partial replica writes by failed node. But
current fix_object_consistency() already looks wrong enough to be
removed and I think we shouldn't include this in June release. How do
you think of it, Kazum?

Thanks,
Yuan