[sheepdog] [PATCH v7] sheep/cluster: add distributed-lock implemented by zookeeper for object-storage

Liu Yuan namei.unix at gmail.com
Tue Dec 3 11:50:28 CET 2013


On Tue, Dec 03, 2013 at 06:15:53PM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai at taobao.com>
> 
> Implement the distributed lock by zookeeper
> (refer:http://zookeeper.apache.org/doc/trunk/recipes.html)
> 
> The routine is:
>         1. create a seq-ephemeral znode in lock directory
> 	   (use lock-id as dir name)
>         2. get smallest file path as owner of the lock; the other thread wait
> 	   on a pthread_mutex_t (cluster_lock->wait)
>         3. if owner of the lock release it (or the owner is killed by accident),
> 	   zookeeper will trigger zk_watch() which will wake up all waiting
> 	   threads to compete new owner of the lock
> 
> We use dlock_array to store pointers of cluster_locks in this sheep daemon so
> when receiving the event of ZOO_DELETED_EVENT the program will wake up all
> waiters (in this sheep daemon) who is sleeping on the lock id and let them
> compete for new owner.
> 
> dlock_array is just a normal array using lock-id as index, so imaging a
> scenario: two threads (A and B) in one sheep daemon call zk_lock() for same
> lock-id, they will create two znodes in zookeeper but set dlock_array[lock_id]
> to only one of them (for example, set to B). After that, when ZOO_DELETED_EVENT
> comes, the zk_waiter() will only wake up thread B and thread A will sleep on
> '->wait' forever becuase no one could wakeup him.
> 
> We have two method to solve this problem:
> 	A. using more complicated structure instead of dlock_array to store
> 	   both A and B 's lock handle.
> 	B. adding a lock to avoid A and B call zk_lock() in the same time.
> We prefer method B because it also avoid creating too many files in a directory
> of zookeeper which will take too much pressure on zookeeper server if the
> number of sheep deamons are huge. Therefore we add 'local_lock' in
> 'struct cluster_lock'.
> 
> v6 --> v7:
> 	1. change bucket number of lock table from 4097 to 1021
> 	2. use sd_hash() for lock table

Applied after chaning sd_hash as sd_hash_64(), thanks!

Yuan



More information about the sheepdog mailing list