[sheepdog] [PATCH 2/2] collie: add a check&repair command
Liu Yuan
namei.unix at gmail.com
Mon Jun 25 13:01:01 CEST 2012
On 06/25/2012 06:51 PM, Christoph Hellwig wrote:
> On Wed, Jun 20, 2012 at 06:16:02PM +0800, Liu Yuan wrote:
>> "With the following scenarios, object replicas could have the different
>> contents:
>>
>> - a gateway node fails while forwarding write requests
>> - total node failure happens while writing objects
>>
>> In the such cases, it is okay for VMs not to read the latest data from
>> the inconsistent objects because the VMs received EIO from them
>> before. However, it is still needed to fix the objects' inconsistency
>> so that the VMs won't read the different data from the objects next
>> time."
>>
>> So when those two case happens, uesrs are expected to run:
>>
>> $ collie check affected_vdi_name
>
> Requiring manual user intervention when a node goes down in a
> distributed storage system is entirely unacceptable. I'm happy to kill
> the dumb version of the consistency fix, but in exchange sheepdog needs
> to have a better internal method to deal with this failure instead of
> bailing out.
>
This command is the last resort to fix consistency like fsck and it
doesn't go against adding automatic recovery mechanism built-in sheep core.
The biggest reason to remove fix_object_consistency() is it is simply
broken in some cases.
Thanks,
Yuan
More information about the sheepdog
mailing list