[Sheepdog] Question about get_obj_list()

Liu Yuan namei.unix at gmail.com
Wed Sep 14 08:32:39 CEST 2011


Hi,
     I am writing something that can gets object distribution stat for 
specified image like

     dev at taobao:~/sheepdog$ collie/collie vdi object tailai.ly --stat
     node                     number of objects
     192.168.0.1:7000        96
     192.168.0.2:7000        95
     192.168.0.3:7000        97
     ....

     In the process, I found a bug in get_obj_list(), which would result 
in sheep aborting  when handling SD_OP_GET_OBJ_LIST. I traced and found 
the culprit was 'buf' that was used to serve as a buffer for object 
list, zalloced from sheep's heap. The problem is, the metadata that 
gcc's malloc implementation reserved for 'buf' would sometimes get 
corrupted and following 'free(buf)' would cause

     *** glibc detected *** sheep/sheep: double free or corruption (out)

or similar problem and sheep process terminated.

     From my personal understanding of the code, get_obj_list() serves 
to return a list of *targeted* objects to the requester. The 
patch[sheep: remove object list file] changed its logic a bit, and there 
is a loop that
iterates from  epoch 1 to epoch n, to merge all the object it finds.

    I am not sure which line of code overrun the 'buf', but when I 
remove the for loop, and just return object list
from one targeted epoch, I have no longer seen the problem.

     So my question is, what is idea behind the for loop? Because 
SD_OP_GET_OBJ_LIST request is served when
the node is active (agree on the epoch that other nodes can see), so the 
targeted epoch exists for sure when serving the request. Actually, old 
objects in the old epoch need to be cleaned up in my opinion. So why bother
searching and get list from them? I think they are simply stale hardlinks.

Thanks,
Yuan



More information about the sheepdog mailing list