Hi to everybody, I'm experiencing problems with sheepdog-0.6 + cache + snapshot. The system is 3 node cluster; 2 nodes have cache enabled. The problem is this: Node A - 2 VM running 1) Via a bash script I suspend ad I/O heavy loaded VM 2) After suspending that I make a snapshot with "collie vdi snapshot -s xxxxx vdiname" 3) I resume di VM 4) I copy image outside of cluster via "qemu-img convert -O qcow2 sheepdog:vdiname:1 pippo.qcow2 Sometime (but not every time) sheeps panic on node A (while on nodes B and C everything continues to work => cache problem) with this backtrace: Jul 19 22:56:39 [gway 19609] add_to_lru_cache(660) PANIC: the object already exist Jul 19 22:56:39 [gway 19609] crash_handler(180) sheep exits unexpectedly (Aborted). Jul 19 22:56:39 [gway 19609] sd_backtrace(833) sheep.c:182: crash_handler Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xfbcf) [0x7f9eefa56bcf] Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x36) [0x7f9eeefa4036] Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7f9eeefa7697] Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:660: add_to_lru_cache Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:710: object_cache_lookup Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:1073: object_cache_handle_request Jul 19 22:56:39 [gway 19609] sd_backtrace(833) ops.c:1385: do_process_work Jul 19 22:56:40 [gway 19609] sd_backtrace(833) work.c:243: worker_routine Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8d) [0x7f9eefa4ef8d] Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f9eef066e1c] Jul 19 22:56:40 [gway 19609] __dump_stack_frames(743) cannot find gdb Jul 19 22:56:40 [gway 19609] __sd_dump_variable(693) cannot find gdb Jul 19 22:56:40 [main] crash_handler(487) sheep pid 14333 exited unexpectedly. I've started experiencing this problem after applying today's PATCH stable-0.6 2/3 => sheep: delete cache objects only when they are succesfully pushed. I think the behaviour is due to high I/O load on the vm (while being backed up) and cache flushed by the snapshot, but I'm not sure. Ing. Luca Lazzeroni - Trend Servizi Srl Responsabile R&D http://www.trendservizi.it |