[Sheepdog] [PATCH v2] object cache: incorrect lock may lead to update lost
Yunkai Zhang
yunkai.me at gmail.com
Sun Apr 15 18:59:36 CEST 2012
On Sun, Apr 15, 2012 at 10:48 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 04/15/2012 09:53 PM, Yunkai Zhang wrote:
>
>> Currently, we use a dirty_rb tree to record which cache object
>> have been updated.
>>
>> Two kinds of threads will operate this dirty tree concurrently:
>> a. mulitple io-workers: write something to cache objects
>> b. one flush-worker: flush all updates to cluster
>>
>> In the following scenario, update will be lost:
>>
>> flush-worker one io-worker
>> [object_cache_push] [object_cache_rw]
>> |-(1) get a cache object from dirty tree |
>> |-(2) read the data file of this object |
>> | modify data file of this object (3)-|
>> |-(4) forward_write_obj_req() |
>> | add this object to dirty tree (5)-|
>> |-(6) rb_erase: remove this object from dirty tree |
>>
>> Note: io-worker generate *new update* in step (3), but flush-worker
>> remove this cache object from dirty tree in step (6).
>>
>> I use two dirty trees to fix this bug and avoid heavy lock between
>> flush-worker and io-wroker threads.
>>
>> There is only one *active* dirty tree for io-workers in any time.
>> After io-worker modify something, it operate this active dirty tree.
>>
>> When flush-worker want to flush all updates to cluster, it:
>> 1. mark another tree as *active* dirty tree.
>> 2. get update info from *inactive* dirty tree.
>> 3. if something wrong occur in flushing process,
>> merge two dirty trees into active drity tree.
>>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
>> ---
>> sheep/object_cache.c | 113 ++++++++++++++++++++++++++++++++++++++++---------
>> sheep/sheep_priv.h | 9 +++-
>> 2 files changed, 99 insertions(+), 23 deletions(-)
>>
>> diff --git a/sheep/object_cache.c b/sheep/object_cache.c
>> index df9ab49..c14ec52 100644
>> --- a/sheep/object_cache.c
>> +++ b/sheep/object_cache.c
>> @@ -127,9 +127,16 @@ not_found:
>> cache = xzalloc(sizeof(*cache));
>> cache->vid = vid;
>> create_dir_for(vid);
>> - cache->dirty_rb = RB_ROOT;
>> +
>> + cache->dirty_trees[0] = RB_ROOT;
>> + cache->dirty_trees[1] = RB_ROOT;
>> + cache->active_dirty_tree = &cache->dirty_trees[0];
>> +
>> + INIT_LIST_HEAD(&cache->dirty_lists[0]);
>> + INIT_LIST_HEAD(&cache->dirty_lists[1]);
>> + cache->active_dirty_list = &cache->dirty_lists[0];
>> +
>> pthread_mutex_init(&cache->lock, NULL);
>> - INIT_LIST_HEAD(&cache->dirty_list);
>> hlist_add_head(&cache->hash, head);
>> } else
>> cache = NULL;
>> @@ -138,19 +145,61 @@ out:
>> return cache;
>> }
>>
>> -static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx, int create)
>> +static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx,
>> + struct object_cache_entry *entry)
>> {
>> - struct object_cache_entry *entry = xzalloc(sizeof(*entry));
>> + int create = 0;
>>
>> - entry->idx = idx;
>> - pthread_mutex_lock(&oc->lock);
>> - if (!dirty_tree_insert(&oc->dirty_rb, entry)) {
>> + if (!entry) {
>> + create = 1;
>> + entry = xzalloc(sizeof(*entry));
>> + entry->idx = idx;
>> + }
>> + if (!dirty_tree_insert(oc->active_dirty_tree, entry)) {
>> if (create)
>> entry->create = 1;
>> - list_add(&entry->list, &oc->dirty_list);
>> + list_add(&entry->list, oc->active_dirty_list);
>> } else
>> free(entry);
>> - pthread_mutex_unlock(&oc->lock);
>> +}
>
>
> It is okay for me to move lock out of the function since you can use it
> with merge_dirty_tree_and_list() in a clean way.
>
> But I think here you can not remove 'create' argument without breakage
> of current logic for object handling because
>
> 'create' is needed to distinguish two kinds of situation:
> 1) object cache layer itself handles the creation of object from Guests
> for SD_OP_CREATE_AND_WRITE_OBJ even if object isn't hit in cache
> 2) for regular R/W, we don't handling object creation, just pulls it
> from the cluster in case of that object is not hit in cache
When we pass NULL as entry's value, the new version of this function
will create a new object_cache_entry just list we pass 1 as create's
value in the old style.
Have any questions?
>
> I'd suggest
> - keeping of 'create' argument for this function
> - would better rename the function as add_to_active_dirty_tree_and_list()
>
>> +
>> +static void switch_dirty_tree_and_list(struct object_cache *oc,
>> + struct rb_root ** inactive_dirty_tree,
>> + struct list_head **inactive_dirty_list)
>> +{
>> + *inactive_dirty_list = oc->active_dirty_list;
>> + *inactive_dirty_tree = oc->active_dirty_tree;
>> +
>> + if (oc->active_dirty_tree == &oc->dirty_trees[0]) {
>> + oc->active_dirty_list = &oc->dirty_lists[1];
>> + oc->active_dirty_tree = &oc->dirty_trees[1];
>> + } else {
>> + oc->active_dirty_list = &oc->dirty_lists[0];
>> + oc->active_dirty_tree = &oc->dirty_trees[0];
>> + }
>> +}
>> +
>> +static inline void del_from_dirty_tree_and_list(
>> + struct object_cache_entry *entry,
>> + struct rb_root *inactive_dirty_tree,
>> + struct list_head *inactive_dirty_list)
>> +{
>> + rb_erase(&entry->rb, inactive_dirty_tree);
>> + list_del(&entry->list);
>> +}
>> +
>
>
> I think this would be a general function, so we'd better simply it
> prototype it as
>
> del_from_dirty_tree_and_list(struct object_cache_entry *entry,
> struct rb_root *tree, struct list_head *list)
>
> Thanks,
> Yuan
--
Yunkai Zhang
Work at Taobao
More information about the sheepdog
mailing list