[Sheepdog] [PATCH v2] object cache: incorrect lock may lead to update lost

Yunkai Zhang yunkai.me at gmail.com
Sun Apr 15 18:59:36 CEST 2012


On Sun, Apr 15, 2012 at 10:48 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 04/15/2012 09:53 PM, Yunkai Zhang wrote:
>
>> Currently, we use a dirty_rb tree to record which cache object
>> have been updated.
>>
>> Two kinds of threads will operate this dirty tree concurrently:
>>  a. mulitple io-workers: write something to cache objects
>>  b. one flush-worker: flush all updates to cluster
>>
>> In the following scenario, update will be lost:
>>
>> flush-worker                                  one io-worker
>> [object_cache_push]                         [object_cache_rw]
>>  |-(1) get a cache object from dirty tree                 |
>>  |-(2) read the data file of this object                  |
>>  |                    modify data file of this object (3)-|
>>  |-(4) forward_write_obj_req()                            |
>>  |                      add this object to dirty tree (5)-|
>>  |-(6) rb_erase: remove this object from dirty tree       |
>>
>> Note: io-worker generate *new update* in step (3), but flush-worker
>> remove this cache object from dirty tree in step (6).
>>
>> I use two dirty trees to fix this bug and avoid heavy lock between
>> flush-worker and io-wroker threads.
>>
>> There is only one *active* dirty tree for io-workers in any time.
>> After io-worker modify something, it operate this active dirty tree.
>>
>> When flush-worker want to flush all updates to cluster, it:
>>  1. mark another tree as *active* dirty tree.
>>  2. get update info from *inactive* dirty tree.
>>  3. if something wrong occur in flushing process,
>>     merge two dirty trees into active drity tree.
>>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
>> ---
>>  sheep/object_cache.c |  113 ++++++++++++++++++++++++++++++++++++++++---------
>>  sheep/sheep_priv.h   |    9 +++-
>>  2 files changed, 99 insertions(+), 23 deletions(-)
>>
>> diff --git a/sheep/object_cache.c b/sheep/object_cache.c
>> index df9ab49..c14ec52 100644
>> --- a/sheep/object_cache.c
>> +++ b/sheep/object_cache.c
>> @@ -127,9 +127,16 @@ not_found:
>>               cache = xzalloc(sizeof(*cache));
>>               cache->vid = vid;
>>               create_dir_for(vid);
>> -             cache->dirty_rb = RB_ROOT;
>> +
>> +             cache->dirty_trees[0] = RB_ROOT;
>> +             cache->dirty_trees[1] = RB_ROOT;
>> +             cache->active_dirty_tree = &cache->dirty_trees[0];
>> +
>> +             INIT_LIST_HEAD(&cache->dirty_lists[0]);
>> +             INIT_LIST_HEAD(&cache->dirty_lists[1]);
>> +             cache->active_dirty_list = &cache->dirty_lists[0];
>> +
>>               pthread_mutex_init(&cache->lock, NULL);
>> -             INIT_LIST_HEAD(&cache->dirty_list);
>>               hlist_add_head(&cache->hash, head);
>>       } else
>>               cache = NULL;
>> @@ -138,19 +145,61 @@ out:
>>       return cache;
>>  }
>>
>> -static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx, int create)
>> +static void add_to_dirty_tree_and_list(struct object_cache *oc, uint32_t idx,
>> +             struct object_cache_entry *entry)
>>  {
>> -     struct object_cache_entry *entry = xzalloc(sizeof(*entry));
>> +     int create = 0;
>>
>> -     entry->idx = idx;
>> -     pthread_mutex_lock(&oc->lock);
>> -     if (!dirty_tree_insert(&oc->dirty_rb, entry)) {
>> +     if (!entry) {
>> +             create = 1;
>> +             entry = xzalloc(sizeof(*entry));
>> +             entry->idx = idx;
>> +     }
>> +     if (!dirty_tree_insert(oc->active_dirty_tree, entry)) {
>>               if (create)
>>                       entry->create = 1;
>> -             list_add(&entry->list, &oc->dirty_list);
>> +             list_add(&entry->list, oc->active_dirty_list);
>>       } else
>>               free(entry);
>> -     pthread_mutex_unlock(&oc->lock);
>> +}
>
>
> It is okay for me to move lock out of the function since you can use it
> with merge_dirty_tree_and_list() in a clean way.
>
> But I think here you can not remove 'create' argument without breakage
> of current logic for object handling because
>
> 'create' is needed to distinguish two kinds of situation:
>  1) object cache layer itself handles the creation of object from Guests
> for SD_OP_CREATE_AND_WRITE_OBJ even if object isn't hit in cache
>  2) for regular R/W, we don't handling object creation, just pulls it
> from the cluster in case of that object is not hit in cache

When we pass NULL as entry's value,  the new version of this function
will create a new object_cache_entry just list we pass 1 as create's
value in the old style.

Have any questions?

>
> I'd suggest
>  - keeping of 'create' argument for this function
>  - would better rename the function as add_to_active_dirty_tree_and_list()
>
>> +
>> +static void switch_dirty_tree_and_list(struct object_cache *oc,
>> +             struct rb_root ** inactive_dirty_tree,
>> +             struct list_head **inactive_dirty_list)
>> +{
>> +     *inactive_dirty_list = oc->active_dirty_list;
>> +     *inactive_dirty_tree = oc->active_dirty_tree;
>> +
>> +     if (oc->active_dirty_tree == &oc->dirty_trees[0]) {
>> +             oc->active_dirty_list = &oc->dirty_lists[1];
>> +             oc->active_dirty_tree = &oc->dirty_trees[1];
>> +     } else {
>> +             oc->active_dirty_list = &oc->dirty_lists[0];
>> +             oc->active_dirty_tree = &oc->dirty_trees[0];
>> +     }
>> +}
>> +
>> +static inline void del_from_dirty_tree_and_list(
>> +             struct object_cache_entry *entry,
>> +             struct rb_root *inactive_dirty_tree,
>> +             struct list_head *inactive_dirty_list)
>> +{
>> +     rb_erase(&entry->rb, inactive_dirty_tree);
>> +     list_del(&entry->list);
>> +}
>> +
>
>
> I think this would be a general function, so we'd better simply it
> prototype it as
>
> del_from_dirty_tree_and_list(struct object_cache_entry *entry,
>                         struct rb_root *tree, struct list_head *list)
>
> Thanks,
> Yuan



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list