[sheepdog] [RFC] create onode before uploading object completed

Liu Yuan namei.unix at gmail.com
Mon Jan 6 10:30:49 CET 2014


On Mon, Jan 06, 2014 at 05:16:22PM +0800, Robin Dong wrote:
> Hi All,
> 
> At present, the implemention of swift interface for creating object in
> sheepdog is:
> 
> 1. lock container
> 2. check whether the onode with same object name is exists.
> 3. unlock container
> 4. upload object
> 5. create onode
> 
> this sequence have a problem: if two clients uploading same objects
> concurrently, it will create two
> objects with same names in container.To avoid duplicated names, we must put
> "create onode"
> operation in container lock regions.
> 
> Therefore we need to change the processes of creating object to:
> 
> 1. lock container
> 2. check whether the onode is exists.
> 3. allocate data space for object, and create onode, then write it done
> 4. unlock container
> 5. upload object
> 
> this routine will avoid uploading duplicated objects.
> 
> There is an exception on the new routine: if the client halt the uploading
> progress, we will have a
> "uploading uncompleted" onode.
> I think this problem is easy to solve: we can add code for onode to
> identify its status.
> A new onode will be set to "INIT", and after uploading completed, the onode
> will be set to  "COMPLETED".
> So, when users try to use swift interface to GET a "uncompleted" object,
> sheep will find out the onode is
> "INIT" which means "not completed", so sheep will return "partial
> content"for http request, and user could
> DELETE the object and upload it again.
> 
> 
> Is there any suggestion for this new design ?
> 

I think this RFC would mean much more than solving the problem. It baisicly
changes the PUT semantics for simultaneous PUT and GET semantics a bit.

"
The Amazon's semantic of simultaneous PUT:

Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket.

Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead.
"

But with this RFC,
- we add a basic object locking for multiple write, that only *first* one can
  succeed.
- we add an extra error code for GET, that it can 'partial content' error code
  if the object is not fully created.

This change looks good to me because
- we don't need to handle partial write (due to power failure, system error, etc)
  and return 'partial content' to client directly
- save client from taking care of putting the same objects and make sure only
  the earlist put will succeed.

How others think of it?

Thanks
Yuan



More information about the sheepdog mailing list