[sheepdog] [PATCH] sheep: stop special casing local objects on gateways

Fri Jun 8 18:18:52 CEST 2012

At Thu, 07 Jun 2012 10:20:49 +0800,
Liu Yuan wrote:
> 
> On 06/07/2012 09:28 AM, MORITA Kazutaka wrote:
> 
> > At Wed, 06 Jun 2012 18:52:16 +0800,
> > Liu Yuan wrote:
> >>
> >> On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
> >>
> >>> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
> >>>> The pros is simplification of code, but you don't say cons, I think we
> >>>> should know the perf number we lost by adding a network overhead for IOs
> >>>> happen to be local, this can be easily collected by built-in tracer.
> >>>
> >>> What kind of setup do you want to see this on, e.g. what number of
> >>> nodes?
> >>>
> >>
> >>
> >> At your convenience :) Maybe two (small cluster and bigger one) would
> >> suffice. With cached FD pool, we might not lose that much performance
> >> for a quick thought, though, but this is quite radical change of design,
> >> numbers will definitely help it in. Kazum, how do you think of this change?
> > 
> > I like the idea to remove object I/Os in the gateway worker threads.
> > 
> > How about something like the following change on top of the Christoph
> > patch?  It still has a socket overhead, but the gateway can read
> > objects from local if they are available.
> > 
> > 
> > diff --git a/sheep/gateway.c b/sheep/gateway.c
> > index 8b7777c..1124f8d 100644
> > --- a/sheep/gateway.c
> > +++ b/sheep/gateway.c
> > @@ -24,6 +24,7 @@ int forward_read_obj_req(struct request *req)
> >  	struct sd_rsp *rsp = (struct sd_rsp *)&fwd_hdr;
> >  	uint64_t oid = req->rq.obj.oid;
> >  	int nr_copies, j;
> > +	struct sd_vnode *obj_vnodes[SD_MAX_COPIES];
> >  
> >  	memcpy(&fwd_hdr, &req->rq, sizeof(fwd_hdr));
> >  	fwd_hdr.flags |= SD_FLAG_CMD_IO_LOCAL;
> > @@ -37,6 +38,16 @@ int forward_read_obj_req(struct request *req)
> >  	 * reading base VM's COW objects
> >  	 */
> >  	j = random();
> > +
> > +	/* Read from local if possible */
> > +	oid_to_vnodes(req->vnodes, oid, nr_copies, obj_vnodes);
> > +	for (i = 0; i < nr_copies; i++) {
> > +		if (vnode_is_local(obj_vnodes[i])) {
> > +			j = i;
> > +			break;
> > +		}
> > +	}
> > +
> >  	for (i = 0; i < nr_copies; i++) {
> >  		int idx = (i + j) % nr_copies;
> >  		struct sd_vnode *v = oid_to_vnode(req->vnodes, oid, idx);
> 
> 
> So what makes read different than write? If we are even at this step,
> why not step further to make a local read as before, just about more
> couple of lines and the logic is straightforward? Shortcut of requests
> to local storage benefit small sized(5 ~ 15) cluster a lot, if we don't
> get a real big advantage over it, we at least should wait to see the
> real advantage ready and solid with performance numbers. And I'd suggest
> to consider this change after June release.

Okay, let's leave it untouched for now.  I think the real advantage
would be the optimization with splice system call mentioned by
Christoph.  I'm looking forward to it.

Thanks,

Kazutaka