On 06/25/2012 06:51 PM, Christoph Hellwig wrote: > On Wed, Jun 20, 2012 at 06:16:02PM +0800, Liu Yuan wrote: >> "With the following scenarios, object replicas could have the different >> contents: >> >> - a gateway node fails while forwarding write requests >> - total node failure happens while writing objects >> >> In the such cases, it is okay for VMs not to read the latest data from >> the inconsistent objects because the VMs received EIO from them >> before. However, it is still needed to fix the objects' inconsistency >> so that the VMs won't read the different data from the objects next >> time." >> >> So when those two case happens, uesrs are expected to run: >> >> $ collie check affected_vdi_name > > Requiring manual user intervention when a node goes down in a > distributed storage system is entirely unacceptable. I'm happy to kill > the dumb version of the consistency fix, but in exchange sheepdog needs > to have a better internal method to deal with this failure instead of > bailing out. > This command is the last resort to fix consistency like fsck and it doesn't go against adding automatic recovery mechanism built-in sheep core. The biggest reason to remove fix_object_consistency() is it is simply broken in some cases. Thanks, Yuan |