[Sheepdog] Sheepdog 0.3.0 schedule and 0.4.0 plan
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Tue Nov 15 15:06:19 CET 2011
At Tue, 15 Nov 2011 07:58:30 -0500,
Christoph Hellwig wrote:
>
> On Tue, Nov 15, 2011 at 08:47:24PM +0900, MORITA Kazutaka wrote:
> > The key idea in the above link is that, when writeback is enabled, a
> > gateway node can send write responses to VMs before replicating data
> > to storage nodes. Note that VM sends write requests to one of
> > Sheepdog nodes (gateway node) first, and then the node replicates data
> > to multiple nodes (storage nodes). Even if we use this approach, the
> > gateway node can send the unstable write requests to the storage nodes
> > ASAP before receiving flush requests. I think this reduces the write
> > latency when we use Sheepdog on the WAN environment.
>
> Okay, now I understand the idea. Yes, this sounds like a useful idea
> to me.
>
> > If the gateway node writes data to the mmaped area before sending
> > responses to VMs, we can regard the local mmapped file as Sheepdog
> > disk cache - this is what I meant in the above link. This approach
> > may also reduce the read latency on the WAN environment.
>
> Any idea why you care about a mmaped area specifically? shared
> writeable mmaps are a horrible I/O interface, most notably they don't
> allow for any kind of error handling. I would absolutely advice against
> using them for clustered storage.
It just looked simple to create a whole disk image file and use it
with mmap() as a disk cache, but probably it would be a bad idea.
>
> Except for that the idea sounds fine - I suspect making the gateway
> node use the same storage mechanism as "normal" endpoint nodes is going
> to both make the code simpler and easier to debug.
It is difficult to use the gateway node as a "normal" storage one
because a VM can use a different gateway after restarting or
migrating. The VM cannot find the previous gateway which has one of
the replicated data.
If we you use the gateway as a "temporal" storage node to store one
extra replicated data for caching, it is very easy.
Thanks,
Kazutaka
More information about the sheepdog
mailing list