At Fri, 19 Jul 2013 23:22:02 +0200, Ing. Luca Lazzeroni - Trend Servizi Srl wrote: Hi Luca, > > Hi to everybody, > I'm experiencing problems with sheepdog-0.6 + cache + snapshot. > The system is 3 node cluster; 2 nodes have cache enabled. > > The problem is this: > > Node A - 2 VM running > > 1) Via a bash script I suspend ad I/O heavy loaded VM > 2) After suspending that I make a snapshot with "collie vdi snapshot -s xxxxx vdiname" > 3) I resume di VM > 4) I copy image outside of cluster via "qemu-img convert -O qcow2 sheepdog:vdiname:1 pippo.qcow2 > > Sometime (but not every time) sheeps panic on node A (while on nodes B and C everything continues to work => cache problem) with this backtrace: > > Jul 19 22:56:39 [gway 19609] add_to_lru_cache(660) PANIC: the object already exist > Jul 19 22:56:39 [gway 19609] crash_handler(180) sheep exits unexpectedly (Aborted). > Jul 19 22:56:39 [gway 19609] sd_backtrace(833) sheep.c:182: crash_handler > Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xfbcf) [0x7f9eefa56bcf] > Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x36) [0x7f9eeefa4036] > Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7f9eeefa7697] > Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:660: add_to_lru_cache > Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:710: object_cache_lookup > Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:1073: object_cache_handle_request > Jul 19 22:56:39 [gway 19609] sd_backtrace(833) ops.c:1385: do_process_work > Jul 19 22:56:40 [gway 19609] sd_backtrace(833) work.c:243: worker_routine > Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8d) [0x7f9eefa4ef8d] > Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f9eef066e1c] > Jul 19 22:56:40 [gway 19609] __dump_stack_frames(743) cannot find gdb > Jul 19 22:56:40 [gway 19609] __sd_dump_variable(693) cannot find gdb > Jul 19 22:56:40 [main] crash_handler(487) sheep pid 14333 exited unexpectedly. > > I've started experiencing this problem after applying today's PATCH stable-0.6 2/3 => sheep: delete cache objects only when they are succesfully pushed. > > I think the behaviour is due to high I/O load on the vm (while being backed up) and cache flushed by the snapshot, but I'm not sure. Thanks for your report. This is a very serious problem. I'll look at it. Thanks, Hitoshi |