<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">2013/11/26 MORITA Kazutaka <span dir="ltr"><<a href="mailto:morita.kazutaka@gmail.com" target="_blank">morita.kazutaka@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">At Mon, 25 Nov 2013 17:02:06 +0800,<br>
<div><div class="h5">Liu Yuan wrote:<br>
><br>
> On Mon, Nov 25, 2013 at 05:43:19PM +0900, MORITA Kazutaka wrote:<br>
> > At Mon, 25 Nov 2013 15:03:46 +0800,<br>
> > Robin Dong wrote:<br>
> > ><br>
> > > The present implementation of http/swift is not perfect, it can't create<br>
> > > too much containers or objects. So we want to store all objects in one<br>
> > > hyper volume vdi and use new structure 'obj-inode' to identify its offset<br>
> > > and length in this vdi, just like some local file system. To achieve this,<br>
> > > we need distributed locks to ensure that only one thread can create a new<br>
> > > 'obj-inode' (or delete) in this vdi at a same time.<br>
> > ><br>
> > > This patch set is a try to implement the distributed lock.<br>
> > ><br>
> > > If we add code in sheep/cluster/zookeeper.c and use the framework of<br>
> > > cluster to implement this distributed lock, then we have to add<br>
> > > implementation for corosyncˇ˘local and shepherd. That's too complicated. So<br>
> > > what we need is adding lock.c in sheep/http/ and only use it in http<br>
> > > interface.<br>
> ><br>
> > If possible, I don't like to see zookeeper specific codes out side of<br>
> > sheep/cluster/zookeeper.c. Can we use a SD_OP_TYPE_CLUSTER operation<br>
> > for your purpose? It works like a cluster-wide distributed lock.<br>
> ><br>
> > For example, vdi creation works like as follows.<br>
> ><br>
> > 1. When sheep receives a SD_OP_NEW_VDI operation, sheep calls<br>
> > cdrv->block() to block all the other cluster operations.<br>
> ><br>
> > 2. Sheep calls cluster_new_vdi() in sd_block_handler(). It is<br>
> > ensured that no other sheep call sd_block_handler() at the same<br>
> > time. This is necessary here because sheepdog doesn't allow<br>
> > concurrent vdi creation requests.<br>
> ><br>
> > 3. All the sheep in the cluster call post_cluster_new_vdi() in<br>
> > sd_notify_handler(). It is usually used for notification or<br>
> > cleanups.<br>
> ><br>
><br>
> I don't think this approach is effecient though it is simpler because we can<br>
> make use of exsiting mechanism, since:<br>
><br>
> - it can't scale, meaning there is only one lock in the cluster.<br>
> And every object creations from different containers will try to compete for<br>
> this lock.<br>
><br>
> - can be affected by operations even not related to http operations. For example,<br>
> 'vdi create' will block the cluster, it means before it unblocks the cluster,<br>
> we can't create/delete objects|container at all.<br>
><br>
> I think a lock per operation is really needed. E.g, every container has a lock<br>
> to achieve concurence of creating objects and won't interfere with other<br>
> containers.<br>
<br>
</div></div>Getting a distributed lock is an expensive operation and it can causes<br>
a severe performance problem if we do it for each object creation.<br>
Can we find another way? Sheepdog is not designed to allow concurrent<br>
write access.<br></blockquote><div><br></div><div>It will hurt performance if the object is very small, but for big object (1GB,10GB,100GB), we only need to lock at "create object inode" moment,<br></div><div>after that, the object-uploading operation do not need the lock.</div>
<div><br></div><div>I have tested this zookeeper lock, it could lock/unlock 200 times per second, which I think is not too slow even for small objects.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
For example, how about determining one gateway based on the hash value<br>
of the requested container name, and forwarding write requests to the<br>
appropriate gateway so that all the objects in the same container is<br>
accessed from only one gateway?<br>
<br>
Thanks,<br>
<br>
Kazutaka<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>--<br>Best Regard<br>Robin Dong
</div></div>