[sheepdog] [Sheepdog] [PATCH v2] sheep: update inode cache first

Wed May 16 04:10:48 CEST 2012

On 05/16/2012 08:36 AM, MORITA Kazutaka wrote:

> Even if we limit the object cache only for normal I/O requests, a
> similar problem still exists.  For example:
> 
>  - there is two nodes (A and B) in Sheepdog cluster
>  - run a VM with write-cache enabled on node A
>  - the VM is crashed, and restarts on node B

I think upper layer(openstack fail-over management) should take care of
flushing the cache, we already have mechanism to to do this, qemu-io -c
"flush".

For the worst case, the host node crashes into wreckage (can't boot up
again), users who use cache should tolerate the dirty update lost in the
cache, because there is no way to retrieve those updates. Similar to
pagecache and disk cache (without power-failure protection), they don't
promise anything for data at all. Any persistent data request should
come up with sync flag.

>  - shutdown the VM on node B and restarts it on node A, then the VM
>    could read the old cache on node A
> 
> It actually happens in real use case to restart VMs on other nodes.
> For example, OpenStack (open source cloud software) selects the nodes
> where VMs start up with its own scheduler.  So, IMHO, it doesn't solve
> the fundamental cache coherency problem to disable object cache for
> snapshot, and cannot weaken the restriction that we shouldn't access
> the same image from the different nodes.

It depends the use case, I think we can't expect cache to survive all
the crashes. For data sensitive APP, I think they should run VM without
cache. They do have their choice from QEMU and sheep options.

Well, that being said, I am fine with taking your read_object() cache,
but it should be considered the best effort we try to get freshest data,
when users want a promise, they are suggested to get rid of this problem
by correctly using upper layer tools.

Thanks,
Yuan