[sheepdog] [RFC] create onode before uploading object completed

Wed Jan 8 03:55:21 CET 2014

On Wed, Jan 08, 2014 at 11:12:41AM +0900, MORITA Kazutaka wrote:
> At Wed, 8 Jan 2014 09:34:00 +0800,
> Robin Dong wrote:
> > 
> > > At Mon, 6 Jan 2014 17:16:22 +0800,
> > > Robin Dong wrote:
> > > >
> > > > Hi All,
> > > >
> > > > At present, the implemention of swift interface for creating object in
> > > > sheepdog is:
> > > >
> > > > 1. lock container
> > > > 2. check whether the onode with same object name is exists.
> > > > 3. unlock container
> > > > 4. upload object
> > > > 5. create onode
> > > >
> > > > this sequence have a problem: if two clients uploading same objects
> > > > concurrently, it will create two
> > > > objects with same names in container.To avoid duplicated names, we must
> > > put
> > > > "create onode"
> > > > operation in container lock regions.
> > > >
> > > > Therefore we need to change the processes of creating object to:
> > > >
> > > > 1. lock container
> > > > 2. check whether the onode is exists.
> > > > 3. allocate data space for object, and create onode, then write it done
> > > > 4. unlock container
> > > > 5. upload object
> > > >
> > > > this routine will avoid uploading duplicated objects.
> > > >
> > > > There is an exception on the new routine: if the client halt the
> > > uploading
> > > > progress, we will have a
> > > > "uploading uncompleted" onode.
> > > > I think this problem is easy to solve: we can add code for onode to
> > > > identify its status.
> > > > A new onode will be set to "INIT", and after uploading completed, the
> > > onode
> > > > will be set to  "COMPLETED".
> > >
> > > Then, the procedure will be as follows?
> > >
> > >   1. lock container
> > >   2. check whether the onode is exists.
> > >   3. allocate data space for object, and create onode, then write it done
> > >   4. mark the onode as "INIT"
> > >   5. unlock container
> > >   6. upload object
> > >   7. mark the onode as "COMPLETED"
> > >
> > > I'm not against this suggestion, but I'm wondering whether we can get
> > > enough performance with this approach.  IIUC, this introduces
> > > additional write requests to the created onode at 7.
> > >
> > 
> > Hi  MORITA,
> > 
> > We may only write the status (may be a "uint_8" type) of onode back at 7. So
> > the performance will not be hurted too much.
> > 
> > 
> > > I've been evaluating Swift these days and noticed that Swift can
> > > process thousands of PUT requests per second with only 3 nodes and 100
> > > disks.  Can Sheepdog achieve similar or better performance with the
> > > suggestion?
> > >
> > 
> > At present, the bottleneck of swift on sheepdog is the distributed-lock on
> > each container.
> > Therefore if we send PUT requests on different one hundred containers, I
> > think sheepdog could
> > achieve the similar performance with the suggestion.
> 
> Okay. I think this should be documented somewhere.  I know some
> benchmark tools for Swift which tries to create many objects on one
> container.  Note that Swift shows high performance even if all the
> requests are against one container.

I guess you can test it with local driver + sheepdog http, which should be
current max throughput and lock overhead would be small enough to evaluate the
performance of http code.

Thanks
Yuan