On 05/16/2012 08:36 AM, MORITA Kazutaka wrote: > Even if we limit the object cache only for normal I/O requests, a > similar problem still exists. For example: > > - there is two nodes (A and B) in Sheepdog cluster > - run a VM with write-cache enabled on node A > - the VM is crashed, and restarts on node B I think upper layer(openstack fail-over management) should take care of flushing the cache, we already have mechanism to to do this, qemu-io -c "flush". For the worst case, the host node crashes into wreckage (can't boot up again), users who use cache should tolerate the dirty update lost in the cache, because there is no way to retrieve those updates. Similar to pagecache and disk cache (without power-failure protection), they don't promise anything for data at all. Any persistent data request should come up with sync flag. > - shutdown the VM on node B and restarts it on node A, then the VM > could read the old cache on node A > > It actually happens in real use case to restart VMs on other nodes. > For example, OpenStack (open source cloud software) selects the nodes > where VMs start up with its own scheduler. So, IMHO, it doesn't solve > the fundamental cache coherency problem to disable object cache for > snapshot, and cannot weaken the restriction that we shouldn't access > the same image from the different nodes. It depends the use case, I think we can't expect cache to survive all the crashes. For data sensitive APP, I think they should run VM without cache. They do have their choice from QEMU and sheep options. Well, that being said, I am fine with taking your read_object() cache, but it should be considered the best effort we try to get freshest data, when users want a promise, they are suggested to get rid of this problem by correctly using upper layer tools. Thanks, Yuan |