[Sheepdog] [PATCH v4] remove oids from object list cache when deleting a vdi

Thu May 3 12:25:43 CEST 2012

v3 ---> v4

1. Add a patch to remove useless inode code in del_vdi()
2. Add a patch to fix a bug of deleting base vdi, submited once before.
3. Make the array deleted_oids[] to allocate memory in the heap, instead
   of in the stack, in case of stack overflow.

This patch set is really critical to the vdi deletion routine,
wish some comment about it.

-----------------------------------------------------
This patch set aims to clear the object list cache after
a vdi is deleted, then we should remove the data objects
which in form of an uint64_t inditifier from the object
list cache.

1. Why we need the clear the object list cache ?

The object list cache is used when the cluster is in recovery,
to provide the object list for request_obj_list(), sheep uses
the list to determine which objects exist in the cluster, and
then try to recover them when cluster is changing.

When the object has been deleted, certainly we should remove it
from the object list cache, so that sheep would not try to recover
the objects alreay deleted which may cause too much time.

2. How we do currently ?

I try to remove the deleted object from the object list cache
in store_remove_obj() after unlink() success.

But there's a big problem:

When the cluster changed one or more times (nodes join or leave) before,
it causes some data objects to migrate from one node to another,
let's talk them as the 'old node' and the 'new node',
but neigher did we remove object id from the objlist cache in the
old node, nor put the object id into the objlist cache in the new
node, here's the problem,

in store_remove_obj(), unlink() may success because the object does
exist, but the object id may not exist in the object list cache in
that node, it may exists in an other node.

PS. Migrating object list cache is rather difficult in recovery, and
   not so necessary if we didn't remove any object from the cluster.

3. What this patch set does ?

After a vdi is deleted successfully, sheep nodifies all the data
objects deleted to all the other node, every node who receives
this message tries to remove the objects from object list cache.

there's also a small problem, after an vdi is deleted, and before
the notification message was received by all the other nodes,
cluster recovery may happens in this time window, it may tries to
recover the objects already deleted.

This problem is difficult to avoid in current situation, but we
can try our best to reduce the probability of recovering deleted
objects by this patch set.

4. Summary about the patches

The first patch removes some inode code that no longer used any more
from del_vdi().

The second patch fixed a bug of deleting base vdi, which I submited
once before.

The third patch fixes a bug about nr_copies in delete_one, I notified
that it hasn't been fixed in the master branch, with this bug, my
patch doesn't work at all, so I fixed it.

The fourth patch takes Liu Yuan's advise to change the name process_work
and process_main to process_top and process_bottom.

The fifth patch makes process_bottom running in worker thread for cluster
request when given a flag SD_FLAG_CMD_WORKER for request header.

The sixth patch do the clear work.

Thanks,
levin