[Sheepdog] Some setattr/getattr strangeness

Thu Oct 13 14:30:44 CEST 2011

At Thu, 13 Oct 2011 13:02:48 +0100,
Chris Webb wrote:
> 
> MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes:
> 
> > Yes, as long as setattr -x is run on the same machine.  Note that
> > Sheepdog object storage doesn't allow concurrent accesses from
> > multiple machines.
> 
> Hi Kazutaka. For this to apply to setattr -x makes the exclusiveness of the
> operation much less useful: if it's only exclusive on a single machine, one
> could equivalently just use fcntl() on a lock file which is cheaper and more
> convenient!

Hmm, yes, you are right.

> 
> I think the semantics for setattr -x were intended to allow it to be used to
> implement the kind of exclusive locking that Sheepdog requires elsewhere
> throughout the system to work correctly: claim the lock exclusively with the
> same convention for the lockfile everywhere, and you know you can safely
> access the vdi without causing divergence. In the absence of this, automated
> users of sheepdog would need to implement a separate global locking
> mechanism (on top of corosync, say) to be able to use sheepdog safely.
> 
> If setattr -x works atomically on a single node and only breaks down when
> there are multiple nodes trying to setattr -x, could one could easily fix
> this by always forwarding setattr from the local sheep to the (guaranteed
> unique) group leader sheep rather than just executing it locally like a
> normal vdi write?

Sheepdog uses a corosync multicast for all global atomic operations,
so I think the correct way is to implement a SD_OP_ATOMIC_WRITE_OBJ
operation with the multicast.

But this limits the size of a vdi attribute to the maximum multicast
size (a few hundreds KB?).  Is it okay for you?

Thanks,

Kazutaka