On 05/24/2012 11:10 PM, Christoph Hellwig wrote: > With the current sheepdog code a test case like the following pseudocode > fails: > > for host in 1..4: > for shep in 1..5: > sheep > > collie cluster format --copies=2 > collie vdi create test-vdi 100M > collie vdi write test vdi <100M of random data> > > for host in 1: > for shep in 1..5: > kill sheep > > sleep 60 # wait for recovery to finish > > for host in 2: > for shep in 1..5: > kill sheep > > sleep 60 # wait for recovery to finish > > collie vdi write test-vdi -a host3 <any data> > > > fails because some objects are only replicated once, not twice. Debugging > showed that the problem was that the SD_OP_GET_OBJ_LIST command for some > of the remaining sheep did not return the full object list. > > Reverting the object list cached fixed the bug, and when I noticed that > the buffer scheme recently introduced could be applied directly to the > farm trunk active list I decided to go down that route instead of debugging > the object cache more. I can't see how the additional rbtree can provide > better performance than just using the farm data structures directly, > but I'm open for arguments. > trunk_active_list is used to track the objects in the working directory, can't be used as object list, which has different updating strategy. If we code object list inside the Farm, we would just move the code of object list cache to Farm, not reduce it. Unless we find a simpler & more efficient implementation inside Farm than current object list cache, we'd better leave it outside Farm. So I think need fix the bug in the object list cache. Thanks, Yuan |