[Sheepdog] [PATCH] object cache: add flush_and_delete operation

Tue Apr 3 19:51:51 CEST 2012

On 04/04/2012 01:19 AM, MORITA Kazutaka wrote:
> At Tue,  3 Apr 2012 16:03:57 +0800,
> Liu Yuan wrote:
>>
>> From: Liu Yuan <tailai.ly at taobao.com>
>>
>> If 1) VDI is opened without cache enabled and 2) we unfortunately have
>> a cache for it previously, we should flush the cache then delete it.
>>
>> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
>> ---
>>  sheep/object_cache.c |   42 ++++++++++++++++++++++++++++++++++++++++++
>>  sheep/sheep_priv.h   |    1 +
>>  sheep/store.c        |   30 +++++++++++++++++++++---------
>>  3 files changed, 64 insertions(+), 9 deletions(-)
>>
>> diff --git a/sheep/object_cache.c b/sheep/object_cache.c
>> index b59f8f7..e856be4 100644
>> --- a/sheep/object_cache.c
>> +++ b/sheep/object_cache.c
>> @@ -20,6 +20,7 @@
>>  #include <pthread.h>
>>  #include <errno.h>
>>  #include <sys/file.h>
>> +#include <dirent.h>
>>  
>>  #include "sheep_priv.h"
>>  #include "util.h"
>> @@ -526,6 +527,47 @@ void object_cache_delete(uint32_t vid)
>>  
>>  }
>>  
>> +int object_cache_flush_and_delete(struct object_cache *oc)
>> +{
>> +	DIR *dir;
>> +	struct dirent *d;
>> +	uint32_t vid = oc->vid;
>> +	uint32_t idx;
>> +	struct strbuf p;
>> +	int ret = 0;
>> +
>> +	strbuf_init(&p, PATH_MAX);
>> +	strbuf_addstr(&p, cache_dir);
>> +	strbuf_addf(&p, "/%06"PRIx32, vid);
>> +
>> +	dprintf("%"PRIx32"\n", vid);
>> +	dir = opendir(p.buf);
>> +	if (!dir) {
>> +		dprintf("%m\n");
>> +		ret = -1;
>> +		goto out;
>> +	}
>> +
>> +	while ((d = readdir(dir))) {
>> +		if (!strncmp(d->d_name, ".", 1))
>> +			continue;
>> +		idx = strtoul(d->d_name, NULL, 16);
>> +		if (idx == ULLONG_MAX)
>> +			continue;
>> +		if (push_cache_object(vid, idx, 1) != SD_RES_SUCCESS) {
>> +			dprintf("failed to push %"PRIx64"\n",
>> +				idx_to_oid(vid, idx));
>> +			ret = -1;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	object_cache_delete(vid);
>> +out:
>> +	strbuf_release(&p);
>> +	return ret;
>> +}
>> +
>>  int object_cache_init(const char *p)
>>  {
>>  	int ret = 0;
>> diff --git a/sheep/sheep_priv.h b/sheep/sheep_priv.h
>> index d687cc0..c01ee46 100644
>> --- a/sheep/sheep_priv.h
>> +++ b/sheep/sheep_priv.h
>> @@ -429,5 +429,6 @@ int object_cache_push(struct object_cache *oc);
>>  int object_cache_init(const char *p);
>>  int object_is_cached(uint64_t oid);
>>  void object_cache_delete(uint32_t vid);
>> +int object_cache_flush_and_delete(struct object_cache *oc);
>>  
>>  #endif
>> diff --git a/sheep/store.c b/sheep/store.c
>> index 6661f13..84bffc2 100644
>> --- a/sheep/store.c
>> +++ b/sheep/store.c
>> @@ -835,16 +835,28 @@ static int bypass_object_cache(struct sd_obj_req *hdr)
>>  {
>>  	uint64_t oid = hdr->oid;
>>  
>> -	/*
>> -	 * We assume the cached object is freshest, donot break it ever.
>> -	 * This assumption is useful for non-cache requests from collie,
>> -	 * which tries hard to get the newest data.
>> -	 */
>> -	if (object_is_cached(oid))
>> -		return 0;
>> +	if (!(hdr->flags & SD_FLAG_CMD_CACHE)) {
>> +		uint32_t vid = oid_to_vid(oid);
>> +		struct object_cache *cache;
>>  
>> -	if (!(hdr->flags & SD_FLAG_CMD_CACHE))
>> -		return 1;
>> +		cache = find_object_cache(vid, 0);
>> +		if (!cache)
>> +			return 1;
>> +		if (hdr->flags & SD_FLAG_CMD_WRITE) {
>> +			object_cache_flush_and_delete(cache);
> 
> Hmm, does this work well when multiple write requests arrive at the
> same time?  I cannot come up with a better approach, though.
> 

The first write will flush&delete the cache in the very early of the startup.
I am not sure if there are multiple write hanging around And there is a file
lock to protect object from concurrent access, so I think we could take advantage
of it. (write a flock version of rmdir_r())

> BTW, do we really need to split a object cache into 4 MB files?  It
> looks simpler and faster to use a single large and sparse file for the
> object caches (the file has the same size and content with the virtual
> disk) since we can avoid extra open/close calls.
> 

For use 4M split objects, I think we can rely on underlying file system to manage
object naming and interact with backend store easily, so it might be NOT simpler
to use a single file, AFAIK.

Maybe I miss some idea for a single file to manage the whole volume space.
for e.g,
1) how to handle a partial mapping (say, disk space is limited, only can hold partial
data) this leads to 2)
2) how to do cache reclaim with sparse file? punch hole?
3) how to handle concurrent access to a single file efficiently?

I guess further more, you need a complex mechanism to map between oid and file offset
with dirty data involved. 

Anyway, I think you'd better cook patches to make your idea concrete. I don't think
saving of some system calls will result in a noticeable speedup if involved in IO.

-- 
thanks,
Yuan