[sheepdog] [PATCH] sheep: stop special casing local objects on gateways

Liu Yuan namei.unix at gmail.com
Thu Jun 7 04:20:49 CEST 2012


On 06/07/2012 09:28 AM, MORITA Kazutaka wrote:

> At Wed, 06 Jun 2012 18:52:16 +0800,
> Liu Yuan wrote:
>>
>> On 06/06/2012 06:44 PM, Christoph Hellwig wrote:
>>
>>> On Wed, Jun 06, 2012 at 11:56:53AM +0800, Liu Yuan wrote:
>>>> The pros is simplification of code, but you don't say cons, I think we
>>>> should know the perf number we lost by adding a network overhead for IOs
>>>> happen to be local, this can be easily collected by built-in tracer.
>>>
>>> What kind of setup do you want to see this on, e.g. what number of
>>> nodes?
>>>
>>
>>
>> At your convenience :) Maybe two (small cluster and bigger one) would
>> suffice. With cached FD pool, we might not lose that much performance
>> for a quick thought, though, but this is quite radical change of design,
>> numbers will definitely help it in. Kazum, how do you think of this change?
> 
> I like the idea to remove object I/Os in the gateway worker threads.
> 
> How about something like the following change on top of the Christoph
> patch?  It still has a socket overhead, but the gateway can read
> objects from local if they are available.
> 
> 
> diff --git a/sheep/gateway.c b/sheep/gateway.c
> index 8b7777c..1124f8d 100644
> --- a/sheep/gateway.c
> +++ b/sheep/gateway.c
> @@ -24,6 +24,7 @@ int forward_read_obj_req(struct request *req)
>  	struct sd_rsp *rsp = (struct sd_rsp *)&fwd_hdr;
>  	uint64_t oid = req->rq.obj.oid;
>  	int nr_copies, j;
> +	struct sd_vnode *obj_vnodes[SD_MAX_COPIES];
>  
>  	memcpy(&fwd_hdr, &req->rq, sizeof(fwd_hdr));
>  	fwd_hdr.flags |= SD_FLAG_CMD_IO_LOCAL;
> @@ -37,6 +38,16 @@ int forward_read_obj_req(struct request *req)
>  	 * reading base VM's COW objects
>  	 */
>  	j = random();
> +
> +	/* Read from local if possible */
> +	oid_to_vnodes(req->vnodes, oid, nr_copies, obj_vnodes);
> +	for (i = 0; i < nr_copies; i++) {
> +		if (vnode_is_local(obj_vnodes[i])) {
> +			j = i;
> +			break;
> +		}
> +	}
> +
>  	for (i = 0; i < nr_copies; i++) {
>  		int idx = (i + j) % nr_copies;
>  		struct sd_vnode *v = oid_to_vnode(req->vnodes, oid, idx);


So what makes read different than write? If we are even at this step,
why not step further to make a local read as before, just about more
couple of lines and the logic is straightforward? Shortcut of requests
to local storage benefit small sized(5 ~ 15) cluster a lot, if we don't
get a real big advantage over it, we at least should wait to see the
real advantage ready and solid with performance numbers. And I'd suggest
to consider this change after June release.

Thanks,
Yuan



More information about the sheepdog mailing list