[sheepdog] [PATCH] sheep: stop special casing local objects on gateways
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Jun 8 18:18:52 CEST 2012
At Thu, 07 Jun 2012 10:20:49 +0800,
Liu Yuan wrote:
>
> On 06/07/2012 09:28 AM, MORITA Kazutaka wrote:
>
> > At Wed, 06 Jun 2012 18:52:16 +0800,
> > Liu Yuan wrote:
> >>
> >> On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
> >>
> >>> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
> >>>> The pros is simplification of code, but you don't say cons, I think we
> >>>> should know the perf number we lost by adding a network overhead for IOs
> >>>> happen to be local, this can be easily collected by built-in tracer.
> >>>
> >>> What kind of setup do you want to see this on, e.g. what number of
> >>> nodes?
> >>>
> >>
> >>
> >> At your convenience :) Maybe two (small cluster and bigger one) would
> >> suffice. With cached FD pool, we might not lose that much performance
> >> for a quick thought, though, but this is quite radical change of design,
> >> numbers will definitely help it in. Kazum, how do you think of this change?
> >
> > I like the idea to remove object I/Os in the gateway worker threads.
> >
> > How about something like the following change on top of the Christoph
> > patch? It still has a socket overhead, but the gateway can read
> > objects from local if they are available.
> >
> >
> > diff --git a/sheep/gateway.c b/sheep/gateway.c
> > index 8b7777c..1124f8d 100644
> > --- a/sheep/gateway.c
> > +++ b/sheep/gateway.c
> > @@ -24,6 +24,7 @@ int forward_read_obj_req(struct request *req)
> > struct sd_rsp *rsp = (struct sd_rsp *)&fwd_hdr;
> > uint64_t oid = req->rq.obj.oid;
> > int nr_copies, j;
> > + struct sd_vnode *obj_vnodes[SD_MAX_COPIES];
> >
> > memcpy(&fwd_hdr, &req->rq, sizeof(fwd_hdr));
> > fwd_hdr.flags |= SD_FLAG_CMD_IO_LOCAL;
> > @@ -37,6 +38,16 @@ int forward_read_obj_req(struct request *req)
> > * reading base VM's COW objects
> > */
> > j = random();
> > +
> > + /* Read from local if possible */
> > + oid_to_vnodes(req->vnodes, oid, nr_copies, obj_vnodes);
> > + for (i = 0; i < nr_copies; i++) {
> > + if (vnode_is_local(obj_vnodes[i])) {
> > + j = i;
> > + break;
> > + }
> > + }
> > +
> > for (i = 0; i < nr_copies; i++) {
> > int idx = (i + j) % nr_copies;
> > struct sd_vnode *v = oid_to_vnode(req->vnodes, oid, idx);
>
>
> So what makes read different than write? If we are even at this step,
> why not step further to make a local read as before, just about more
> couple of lines and the logic is straightforward? Shortcut of requests
> to local storage benefit small sized(5 ~ 15) cluster a lot, if we don't
> get a real big advantage over it, we at least should wait to see the
> real advantage ready and solid with performance numbers. And I'd suggest
> to consider this change after June release.
Okay, let's leave it untouched for now. I think the real advantage
would be the optimization with splice system call mentioned by
Christoph. I'm looking forward to it.
Thanks,
Kazutaka
More information about the sheepdog
mailing list