[sheepdog] [PATCH RFT 0/4] garbage collect needless VIDs and inode objects

Wed Dec 17 08:31:53 CET 2014

At Tue, 16 Dec 2014 12:28:29 +0100,
Valerio Pachera wrote:
> 
> 2014-12-15 10:36 GMT+01:00 Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>:
> > Current sheepdog never recycles VIDs. But it will cause problems
> > e.g. VID space exhaustion, too much garbage inode objects.
> 
> I've been testing this branch and it seem to work.
> I use a script that creates 3 vdi, 3 snapshot for each (writing 10M of
> data), then removes them and look for objects with name starting with
> "80*".
> 
> With all snap active
> /mnt/sheep/1/80fd366300000000
> /mnt/sheep/0/80fd381800000000
> /mnt/sheep/0/80fd32fc00000000
> /mnt/sheep/0/80fd32fd00000000
> /mnt/sheep/0/80fd32fe00000000
> 
> After removing all snap
> /mnt/sheep/1/80fd366300000000
> /mnt/sheep/0/80fd381800000000
> /mnt/sheep/0/80fd32fc00000000
> /mnt/sheep/0/80fd32fd00000000
> /mnt/sheep/0/80fd32fe00000000
> 
> After removing all vdi
> <empty>
> 
> sheep -v
> Sheepdog daemon version 0.9.0_25_g24ef77f
> 
> But I found a repeatable sheepdog crash!
> I notice that happening if I was running the script a second time.
> The crash occur after when I recreate a vdi with the same name and
> then I take a snapshot of it.
> 
> Dec 16 12:12:42   INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40067, op=DEL_VDI, result=00
> Dec 16 12:12:47   INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40069, op=DEL_VDI, data=(not string)
> Dec 16 12:12:47   INFO [main] run_vid_gc(2106) all members of the
> family (root: fd3662) are deleted
> Dec 16 12:12:47   INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40069, op=DEL_VDI, result=00
> Dec 16 12:13:57   INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40072, op=NEW_VDI, data=(not string)
> Dec 16 12:13:57   INFO [main] post_cluster_new_vdi(133)
> req->vdi.base_vdi_id: 0, rsp->vdi.vdi_id: fd32fc
> Dec 16 12:13:57   INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40072, op=NEW_VDI, result=00
> Dec 16 12:14:12   INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40074, op=NEW_VDI, data=(not string)
> Dec 16 12:14:13   INFO [main] post_cluster_new_vdi(133)
> req->vdi.base_vdi_id: 0, rsp->vdi.vdi_id: fd3815
> Dec 16 12:14:13   INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40074, op=NEW_VDI, result=00
> Dec 16 12:14:23   INFO [main] rx_main(830) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40076, op=NEW_VDI, data=(not string)
> Dec 16 12:14:23   INFO [main] post_cluster_new_vdi(133)
> req->vdi.base_vdi_id: 0, rsp->vdi.vdi_id: fd3662
> Dec 16 12:14:23   INFO [main] tx_main(882) req=0x7f314400e5a0, fd=26,
> client=127.0.0.1:40076, op=NEW_VDI, result=00
> Dec 16 12:14:34   INFO [main] rx_main(830) req=0x7f314400d310, fd=26,
> client=127.0.0.1:40078, op=NEW_VDI, data=(not string)
> Dec 16 12:14:34  EMERG [main] crash_handler(268) sheep exits
> unexpectedly (Segmentation fault).
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) sheep.c:270: crash_handler
> Dec 16 12:14:34  EMERG [main] sd_backtrace(847)
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f31515cc02f]
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) vdi.c:64:
> lookup_vdi_family_member
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) vdi.c:109: update_vdi_family
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) vdi.c:396: add_vdi_state
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) ops.c:674:
> cluster_notify_vdi_add
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) group.c:948: sd_notify_handler
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) zookeeper.c:1252:
> zk_event_handler
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) event.c:210: do_event_loop
> Dec 16 12:14:34  EMERG [main] sd_backtrace(833) sheep.c:963: main
> Dec 16 12:14:34  EMERG [main] sd_backtrace(847)
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc)
> [0x7f3150badeac]
> Dec 16 12:14:34  EMERG [main] sd_backtrace(847) sheep() [0x405fa8]
> 
> How to reproduce:
> 
> dog cluster format -c 2
> dog vdi create -P  test 1G
> dog vdi snapshot test
> dd if=/dev/urandom bs=1M count=10 | dog vdi write test
> dog vdi delete -s 1 test
> dog vdi delete test
> echo 'Recreating vdi test'
> dog vdi create -P  test 1G
> dog vdi snapshot test   <-- at this point, sheep crashes
> dog vdi list

Thanks for your report, I've fixed the problem and updated the gc-vid branch.

Thanks,
Hitoshi

> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog