[Sheepdog] Need inputs on performing some operations atomically

Narendra Prasad Madanapalli narendramind at gmail.com
Sun Sep 19 05:00:16 CEST 2010

Thanks Kazutaka.

I need some more clarifications on epoch, objects & cluster nodes. I
have observed the following

1. When a single node is started, /store_dir contains only epoch/,
obj/, & sheep.log
2. Thereby I ran format copies=3, it created a file epoch/00000001 and
a dir obj/00000001/
3. Now I started a new node, it created the following files in the new node:

4. At this stage, created a new VM. This creates a vdi in either nodes.

5. On the 2nd node I made the n/w down. The following additional files
are created
1st Node:

2nd Node:

However, this is still a gray area to me as I cannot get clear idea on
this. I unerstand sd_deliver is reposible for cluster events and
sd_conchg is for any configuration changes in cluster nodes.

It would be great if you provide insights into the algorithm and
details of relashionship among epoch, obj & cluster-nodes. I believe
this would shed lights in solving atomic operation problems.


On Wed, Sep 15, 2010 at 2:42 AM, MORITA Kazutaka
<morita.kazutaka at lab.ntt.co.jp> wrote:
> At Sun, 12 Sep 2010 19:41:34 +0530,
> Narendra Prasad Madanapalli wrote:
>> Hi,
>> I found there are two functions that are to be executed atomically in
>> sheep. These functions are below:
>> 1. sheep/store.c:
>>                         /* FIXME: need to update atomically */
>> /*                      ret = verify_object(fd, NULL, 0, 1); */
>> /*                      if (ret < 0) { */
>> /*                              eprintf("failed to set checksum,
>> %"PRIx64"\n", oid); */
>> /*                              ret = SD_RES_EIO; */
>> /*                              goto out; */
>> /*                      } */
>> 2. sheep/vdi.c:
>> /* TODO: should be performed atomically */
>> static int create_vdi_obj(uint32_t epoch, char *name, uint32_t
>> new_vid, uint64_t size,
>>                           uint32_t base_vid, uint32_t cur_vid, uint32_t copies,
>>                           uint32_t snapid, int is_snapshot)
>> {
>> My understanding is that these two functions get executed in
>> worker_routine() in response to queue_request() & queue_work().
>> Solution for verify_object()
>> Since this operates on file descriptor, I think this can be performed
>> with the help of file locking mechanism.
> No.  Basically we don't need a lock mechanism for sheepdog objects;
> all objects are categorized into the following two groups.
>  - only one virtual machine can access the object
>  - all virtual machines can read the object, but no one update it
> What we need to do here is atomic update of the vdi objects.  For
> example, if total node failure happens during updating vdi objects, we
> need to roll-back to the previous right state.
>> Solution for create_vdi_obj()
>> This can be fixed by introducing global pthread lock.
> Same as above.  If a master node fails during creating a new vdi
> object, the next master need to take over the work, or more easily,
> delete the object and return error code to the administrator.
> Thanks,
> Kazutaka

More information about the sheepdog mailing list