[Sheepdog] Question about get_obj_list()

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Sep 15 05:14:01 CEST 2011


At Wed, 14 Sep 2011 14:32:39 +0800,
Liu Yuan wrote:
> 
> Hi,
>      I am writing something that can gets object distribution stat for 
> specified image like
> 
>      dev at taobao:~/sheepdog$ collie/collie vdi object tailai.ly --stat
>      node                     number of objects
>      192.168.0.1:7000        96
>      192.168.0.2:7000        95
>      192.168.0.3:7000        97
>      ....

Sounds nice. :)

> 
>      In the process, I found a bug in get_obj_list(), which would result 
> in sheep aborting  when handling SD_OP_GET_OBJ_LIST. I traced and found 
> the culprit was 'buf' that was used to serve as a buffer for object 
> list, zalloced from sheep's heap. The problem is, the metadata that 
> gcc's malloc implementation reserved for 'buf' would sometimes get 
> corrupted and following 'free(buf)' would cause
> 
>      *** glibc detected *** sheep/sheep: double free or corruption (out)
> 
> or similar problem and sheep process terminated.
> 
>      From my personal understanding of the code, get_obj_list() serves 
> to return a list of *targeted* objects to the requester. The 
> patch[sheep: remove object list file] changed its logic a bit, and there 
> is a loop that
> iterates from  epoch 1 to epoch n, to merge all the object it finds.
> 
>     I am not sure which line of code overrun the 'buf', but when I 
> remove the for loop, and just return object list
> from one targeted epoch, I have no longer seen the problem.

Oops, I'll take a look at this issue.

> 
>      So my question is, what is idea behind the for loop? Because 
> SD_OP_GET_OBJ_LIST request is served when
> the node is active (agree on the epoch that other nodes can see), so the 
> targeted epoch exists for sure when serving the request. Actually, old 
> objects in the old epoch need to be cleaned up in my opinion. So why bother
> searching and get list from them? I think they are simply stale hardlinks.

It is because to handle multiple node failure.  If Sheepdog increments
a epoch number before finishing object recovery, the latest epoch
directory will not have all objects it should have.  To handle this
problem, in the previous Sheepdog, we created a object list file just
after updating epoch, but there were some problem about it.  I think
the simplest way to create a correct object list is searching all the
object in the Sheepdog cluster, and this is the reason of looping.


Thanks,

Kazutaka



More information about the sheepdog mailing list