[sheepdog] [PATCH 2/2] collie: add a check&repair command

Mon Jun 25 12:51:16 CEST 2012

On Wed, Jun 20, 2012 at 06:16:02PM +0800, Liu Yuan wrote:
>   "With the following scenarios, object replicas could have the different
>    contents:
> 
>   - a gateway node fails while forwarding write requests
>   - total node failure happens while writing objects
> 
>   In the such cases, it is okay for VMs not to read the latest data from
>   the inconsistent objects because the VMs received EIO from them
>   before.  However, it is still needed to fix the objects' inconsistency
>   so that the VMs won't read the different data from the objects next
>   time."
> 
> So when those two case happens, uesrs are expected to run:
> 
> $ collie check affected_vdi_name

Requiring manual user intervention when a node goes down in a
distributed storage system is entirely unacceptable.  I'm happy to kill
the dumb version of the consistency fix, but in exchange sheepdog needs
to have a better internal method to deal with this failure instead of
bailing out.