[Sheepdog] Sheepdog 0.3.0 schedule and 0.4.0 plan

Tue Nov 15 12:47:24 CET 2011

At Tue, 15 Nov 2011 05:43:14 -0500,
Christoph Hellwig wrote:
> 
> On Tue, Nov 15, 2011 at 04:50:58PM +0800, Yibin Shen wrote:
> > On Tue, Nov 15, 2011 at 3:40 PM, Christoph Hellwig <hch at infradead.org> wrote:
> > > On Tue, Nov 15, 2011 at 11:05:33AM +0900, MORITA Kazutaka wrote:
> > >> The planned big features for 0.4.0 are as follows:
> > >>
> > >>  - add write-cache support
> > >>    http://lists.wpkg.org/pipermail/sheepdog/2011-October/001480.html
> > >
> > > That description sounds a bit odd.  I've started hacking support on
> > > a traditional writeback cache - that is data still gets written out
> > > using normal write on all storage nodes, but we can skip the O_SYNC
> > > flag.  Then the client sends a cache flush command and we do a syncfs
> > does this 'client' mean qemu block driver ?
> 
> Yes.
> 
> > my other question is , if we only bypass the O_SYNC flag,  and without
> > cache stored
> > at local disk(or memory),  so lots of read traffic maybe transfered
> > through network,
> > then how can we increase the read side performance?
> > so I think we can't discard the cache prefetch or read ahead support.
> 
> It's not going to help with read performance indeed.  In my benchmarks
> so read performance wasn't a major issue, mostly because I always had
> a copy of all objects stored on the machine where qemu runs.
> 
> If you care about read performance having the objects locally is what
> you need - adding a config tweak that makes sure to keep a local copy
> of objects read at least once might be a good idea.
> 
> The proposal linked above probaby isn't too benefical for write
> performance, given that you only start pushing things to the network
> once the flush routine is called, and thus use a lot of bandwith in
> the latency critical flush roundtrip.  Sending unstable write requests
> to all nodes ASAP, and only doing the final sync on flush will get
> much better performance.

The key idea in the above link is that, when writeback is enabled, a
gateway node can send write responses to VMs before replicating data
to storage nodes.  Note that VM sends write requests to one of
Sheepdog nodes (gateway node) first, and then the node replicates data
to multiple nodes (storage nodes).  Even if we use this approach, the
gateway node can send the unstable write requests to the storage nodes
ASAP before receiving flush requests.  I think this reduces the write
latency when we use Sheepdog on the WAN environment.

If the gateway node writes data to the mmaped area before sending
responses to VMs, we can regard the local mmapped file as Sheepdog
disk cache - this is what I meant in the above link.  This approach
may also reduce the read latency on the WAN environment.

I think we can combine these optimizations with the traditional
writeback cache implementation; write requests are regarded as
completed when the gateway node receive them, and the requests will be
replicated ASAP to storage nodes without O_SYNC.

Thanks,

Kazutaka

> 
> > > system call on all nodes to make sure data is on stable store from
> > > .bdrv_co_flush.
> > >
> > > I've implemented the client side, and a local implementation of the
> > > cache flushing, but so far I've failed finding a way to forward it
> > > to each node that has an object for the given image exactly once.
> > >
> > > If you're interested I can send the current WIP patches for this out to
> > > the list.
> > we really appreciate it!
> 
> I'll try to get what I have into a somewhat presentable format.
> 
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog