[Sheepdog] Sheepdog 0.3.0 schedule and 0.4.0 plan

Tue Nov 15 15:06:19 CET 2011

At Tue, 15 Nov 2011 07:58:30 -0500,
Christoph Hellwig wrote:
> 
> On Tue, Nov 15, 2011 at 08:47:24PM +0900, MORITA Kazutaka wrote:
> > The key idea in the above link is that, when writeback is enabled, a
> > gateway node can send write responses to VMs before replicating data
> > to storage nodes.  Note that VM sends write requests to one of
> > Sheepdog nodes (gateway node) first, and then the node replicates data
> > to multiple nodes (storage nodes).  Even if we use this approach, the
> > gateway node can send the unstable write requests to the storage nodes
> > ASAP before receiving flush requests.  I think this reduces the write
> > latency when we use Sheepdog on the WAN environment.
> 
> Okay, now I understand the idea.  Yes, this sounds like a useful idea
> to me.
> 
> > If the gateway node writes data to the mmaped area before sending
> > responses to VMs, we can regard the local mmapped file as Sheepdog
> > disk cache - this is what I meant in the above link.  This approach
> > may also reduce the read latency on the WAN environment.
> 
> Any idea why you care about a mmaped area specifically?  shared
> writeable mmaps are a horrible I/O interface, most notably they don't
> allow for any kind of error handling.  I would absolutely advice against
> using them for clustered storage.

It just looked simple to create a whole disk image file and use it
with mmap() as a disk cache, but probably it would be a bad idea.

> 
> Except for that the idea sounds fine - I suspect making the gateway
> node use the same storage mechanism as "normal" endpoint nodes is going
> to both make the code simpler and easier to debug.

It is difficult to use the gateway node as a "normal" storage one
because a VM can use a different gateway after restarting or
migrating.  The VM cannot find the previous gateway which has one of
the replicated data.

If we you use the gateway as a "temporal" storage node to store one
extra replicated data for caching, it is very easy.

Thanks,

Kazutaka