At Thu, 07 Jun 2012 10:20:49 +0800, Liu Yuan wrote: > > On 06/07/2012 09:28 AM, MORITA Kazutaka wrote: > > > At Wed, 06 Jun 2012 18:52:16 +0800, > > Liu Yuan wrote: > >> > >> On 06/06/2012 06:44 PM, Christoph Hellwig wrote: > >> > >>> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote: > >>>> The pros is simplification of code, but you don't say cons, I think we > >>>> should know the perf number we lost by adding a network overhead for IOs > >>>> happen to be local, this can be easily collected by built-in tracer. > >>> > >>> What kind of setup do you want to see this on, e.g. what number of > >>> nodes? > >>> > >> > >> > >> At your convenience :) Maybe two (small cluster and bigger one) would > >> suffice. With cached FD pool, we might not lose that much performance > >> for a quick thought, though, but this is quite radical change of design, > >> numbers will definitely help it in. Kazum, how do you think of this change? > > > > I like the idea to remove object I/Os in the gateway worker threads. > > > > How about something like the following change on top of the Christoph > > patch? It still has a socket overhead, but the gateway can read > > objects from local if they are available. > > > > > > diff --git a/sheep/gateway.c b/sheep/gateway.c > > index 8b7777c..1124f8d 100644 > > --- a/sheep/gateway.c > > +++ b/sheep/gateway.c > > @@ -24,6 +24,7 @@ int forward_read_obj_req(struct request *req) > > struct sd_rsp *rsp = (struct sd_rsp *)&fwd_hdr; > > uint64_t oid = req->rq.obj.oid; > > int nr_copies, j; > > + struct sd_vnode *obj_vnodes[SD_MAX_COPIES]; > > > > memcpy(&fwd_hdr, &req->rq, sizeof(fwd_hdr)); > > fwd_hdr.flags |= SD_FLAG_CMD_IO_LOCAL; > > @@ -37,6 +38,16 @@ int forward_read_obj_req(struct request *req) > > * reading base VM's COW objects > > */ > > j = random(); > > + > > + /* Read from local if possible */ > > + oid_to_vnodes(req->vnodes, oid, nr_copies, obj_vnodes); > > + for (i = 0; i < nr_copies; i++) { > > + if (vnode_is_local(obj_vnodes[i])) { > > + j = i; > > + break; > > + } > > + } > > + > > for (i = 0; i < nr_copies; i++) { > > int idx = (i + j) % nr_copies; > > struct sd_vnode *v = oid_to_vnode(req->vnodes, oid, idx); > > > So what makes read different than write? If we are even at this step, > why not step further to make a local read as before, just about more > couple of lines and the logic is straightforward? Shortcut of requests > to local storage benefit small sized(5 ~ 15) cluster a lot, if we don't > get a real big advantage over it, we at least should wait to see the > real advantage ready and solid with performance numbers. And I'd suggest > to consider this change after June release. Okay, let's leave it untouched for now. I think the real advantage would be the optimization with splice system call mentioned by Christoph. I'm looking forward to it. Thanks, Kazutaka |