v3 ---> v4 1. Add a patch to remove useless inode code in del_vdi() 2. Add a patch to fix a bug of deleting base vdi, submited once before. 3. Make the array deleted_oids[] to allocate memory in the heap, instead of in the stack, in case of stack overflow. This patch set is really critical to the vdi deletion routine, wish some comment about it. ----------------------------------------------------- This patch set aims to clear the object list cache after a vdi is deleted, then we should remove the data objects which in form of an uint64_t inditifier from the object list cache. 1. Why we need the clear the object list cache ? The object list cache is used when the cluster is in recovery, to provide the object list for request_obj_list(), sheep uses the list to determine which objects exist in the cluster, and then try to recover them when cluster is changing. When the object has been deleted, certainly we should remove it from the object list cache, so that sheep would not try to recover the objects alreay deleted which may cause too much time. 2. How we do currently ? I try to remove the deleted object from the object list cache in store_remove_obj() after unlink() success. But there's a big problem: When the cluster changed one or more times (nodes join or leave) before, it causes some data objects to migrate from one node to another, let's talk them as the 'old node' and the 'new node', but neigher did we remove object id from the objlist cache in the old node, nor put the object id into the objlist cache in the new node, here's the problem, in store_remove_obj(), unlink() may success because the object does exist, but the object id may not exist in the object list cache in that node, it may exists in an other node. PS. Migrating object list cache is rather difficult in recovery, and not so necessary if we didn't remove any object from the cluster. 3. What this patch set does ? After a vdi is deleted successfully, sheep nodifies all the data objects deleted to all the other node, every node who receives this message tries to remove the objects from object list cache. there's also a small problem, after an vdi is deleted, and before the notification message was received by all the other nodes, cluster recovery may happens in this time window, it may tries to recover the objects already deleted. This problem is difficult to avoid in current situation, but we can try our best to reduce the probability of recovering deleted objects by this patch set. 4. Summary about the patches The first patch removes some inode code that no longer used any more from del_vdi(). The second patch fixed a bug of deleting base vdi, which I submited once before. The third patch fixes a bug about nr_copies in delete_one, I notified that it hasn't been fixed in the master branch, with this bug, my patch doesn't work at all, so I fixed it. The fourth patch takes Liu Yuan's advise to change the name process_work and process_main to process_top and process_bottom. The fifth patch makes process_bottom running in worker thread for cluster request when given a flag SD_FLAG_CMD_WORKER for request header. The sixth patch do the clear work. Thanks, levin |