[Sheepdog] [PATCH v2] object cache: incorrect lock may lead to update lost

Liu Yuan namei.unix at gmail.com
Sun Apr 15 16:48:31 CEST 2012


On 04/15/2012 09:53 PM, Yunkai Zhang wrote:

> Currently, we use a dirty_rb tree to record which cache object
> have been updated.
> 
> Two kinds of threads will operate this dirty tree concurrently:
>  a. mulitple io-workers: write something to cache objects
>  b. one flush-worker: flush all updates to cluster
> 
> In the following scenario, update will be lost:
> 
> flush-worker                                  one io-worker
> [object_cache_push]                         [object_cache_rw]
>  |-(1) get a cache object from dirty tree                 |
>  |-(2) read the data file of this object                  |
>  |                    modify data file of this object (3)-|
>  |-(4) forward_write_obj_req()                            |
>  |                      add this object to dirty tree (5)-|
>  |-(6) rb_erase: remove this object from dirty tree       |
> 
> Note: io-worker generate *new update* in step (3), but flush-worker
> remove this cache object from dirty tree in step (6).
> 
> I use two dirty trees to fix this bug and avoid heavy lock between
> flush-worker and io-wroker threads.
> 
> There is only one *active* dirty tree for io-workers in any time.
> After io-worker modify something, it operate this active dirty tree.
> 
> When flush-worker want to flush all updates to cluster, it:
>  1. mark another tree as *active* dirty tree.
>  2. get update info from *inactive* dirty tree.
>  3. if something wrong occur in flushing process,
>     merge two dirty trees into active drity tree.
> 
> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
> ---
>  sheep/object_cache.c |  113 ++++++++++++++++++++++++++++++++++++++++---------
>  sheep/sheep_priv.h   |    9 +++-
>  2 files changed, 99 insertions(+), 23 deletions(-)
> 
> diff --git a/sheep/object_cache.c b/sheep/object_cache.c
> index df9ab49..c14ec52 100644
> --- a/sheep/object_cache.c
> +++ b/sheep/object_cache.c
> @@ -127,9 +127,16 @@ not_found:
>  		cache = xzalloc(sizeof(*cache));
>  		cache->vid = vid;
>  		create_dir_for(vid);
> -		cache->dirty_rb = RB_ROOT;
> +
> +		cache->dirty_trees[0] = RB_ROOT;
> +		cache->dirty_trees[1] = RB_ROOT;
> +		cache->active_dirty_tree = &cache->dirty_trees[0];
> +
> +		INIT_LIST_HEAD(&cache->dirty_lists[0]);
> +		INIT_LIST_HEAD(&cache->dirty_lists[1]);
> +		cache->active_dirty_list = &cache->dirty_lists[0];
> +
>  		pthread_mutex_init(&cache->lock, NULL);
> -		INIT_LIST_HEAD(&cache->dirty_list);
>  		hlist_add_head(&cache->hash, head);
>  	} else
>  		cache = NULL;
> @@ -138,19 +145,61 @@ out:
>  	return cache;
>  }
>  
> -static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx, int create)
> +static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx,
> +		struct object_cache_entry *entry)
>  {
> -	struct object_cache_entry *entry = xzalloc(sizeof(*entry));
> +	int create = 0;
> 
> -	entry->idx = idx;
> -	pthread_mutex_lock(&oc->lock);
> -	if (!dirty_tree_insert(&oc->dirty_rb, entry)) {
> +	if (!entry) {
> +		create = 1;
> +		entry = xzalloc(sizeof(*entry));
> +		entry->idx = idx;
> +	}
> +	if (!dirty_tree_insert(oc->active_dirty_tree, entry)) {
>  		if (create)
>  			entry->create = 1;
> -		list_add(&entry->list, &oc->dirty_list);
> +		list_add(&entry->list, oc->active_dirty_list);
>  	} else
>  		free(entry);
> -	pthread_mutex_unlock(&oc->lock);
> +}


It is okay for me to move lock out of the function since you can use it
with merge_dirty_tree_and_list() in a clean way.

But I think here you can not remove 'create' argument without breakage
of current logic for object handling because

'create' is needed to distinguish two kinds of situation:
 1) object cache layer itself handles the creation of object from Guests
for SD_OP_CREATE_AND_WRITE_OBJ even if object isn't hit in cache
 2) for regular R/W, we don't handling object creation, just pulls it
from the cluster in case of that object is not hit in cache

I'd suggest
 - keeping of 'create' argument for this function
 - would better rename the function as add_to_active_dirty_tree_and_list()

> +
> +static void switch_dirty_tree_and_list(struct object_cache *oc,
> +		struct rb_root ** inactive_dirty_tree,
> +		struct list_head **inactive_dirty_list)
> +{
> +	*inactive_dirty_list = oc->active_dirty_list;
> +	*inactive_dirty_tree = oc->active_dirty_tree;
> +
> +	if (oc->active_dirty_tree == &oc->dirty_trees[0]) {
> +		oc->active_dirty_list = &oc->dirty_lists[1];
> +		oc->active_dirty_tree = &oc->dirty_trees[1];
> +	} else {
> +		oc->active_dirty_list = &oc->dirty_lists[0];
> +		oc->active_dirty_tree = &oc->dirty_trees[0];
> +	}
> +}
> +
> +static inline void del_from_dirty_tree_and_list(
> +		struct object_cache_entry *entry,
> +		struct rb_root *inactive_dirty_tree,
> +		struct list_head *inactive_dirty_list)
> +{
> +	rb_erase(&entry->rb, inactive_dirty_tree);
> +	list_del(&entry->list);
> +}
> +


I think this would be a general function, so we'd better simply it
prototype it as

del_from_dirty_tree_and_list(struct object_cache_entry *entry,
			 struct rb_root *tree, struct list_head *list)

Thanks,
Yuan



More information about the sheepdog mailing list