[Sheepdog] Stale Objects Questions/Observations

Shawn Moore smmoore at gmail.com
Wed Feb 22 15:55:40 CET 2012


I have been following, using and testing SD for a while now and saw a
commit comment (https://github.com/collie/sheepdog/commit/a1cf3cbd28fb09ee0cf5dba30876b0502e8023dc)
that mentioned stale object cleanup.  My questions/observations
follow.

I have been testing against the farm version of the code and it looks
like not all of the objects are being cleaned up when a vdi is
deleted.  I have done several tests and it seems the VDI_ID00000000
files are left after the vdi's are deleted.  I have done several
rounds of testing with 44 nodes.  If each node creates a vdi called
"NODEX" where X in a number, I'm left with 44 vdi's.  I have done it
with 128MB vdi's and 10G vdi's.  No matter the size, I'm left with 88
files @ 4MB after all vdi's are deleted.  88 because copies = 2.  Of
the 88 only 44 have unique id's due to copies = 2.  Should these
remaining object files be there since all vdi's have been deleted?

I also would like to know if epoch cleanup is supposed to be fixed yet
or not.  When I drop a node and then put it back, the amount of data
in the cluster, from the perspective of the os, grows even though no
activity inside the vdi's took place.  This appears to be limited to
the node that dropped.  The cluster was at epoch 1 when I dropped the
node, I waited until the cluster reported being back to full size to
bring the node back.  Then after the cluster reported being back to
full size, I checked and now it's at epoch 3 (understood because 2 was
dropping node and 3 was it coming back.  But the os reports double the
space for that node.  There is no object directory for epoch 2 on it
because it was not a member of the cluster during epoch 2, but
directories 1 and 3 are both the same size.  So then to finish my
test, I deleted all vdi's.  After a few minutes, "collie node info -r"
shows "Total 2000572 3695091 0% 0".  It reports no data, but according
to os utils, over 900GB still remains in "/sheep/obj/*".  I then
shutdown the cluster "collie cluster shutdown" wait a few and then
restart it, now it shows "Total 2922982 9253980 31% 0".  "collie vdi
list -r" shows no vdi's.  os's still report over 900GB.  Why if there
are no vdi's is there still over 900GB worth of data?  I assume the
900GB includes ~ 21GB of duplicated data between epoch 1 and 3 on the
node that dropped and came back, but I would think in the end old data
should be purged in this situation as well.

I have lots of data if desired showing what files/times/sizes were in
/sheep/obj/* at various stages, as well as what the os reports at same
time.  I get strange results with various du commands, but believe
that is due to usage of hard links between epochs.

Regards,

Shawn



More information about the sheepdog mailing list