[sheepdog-users] Panic problem with stable-0.6

Sat Jul 20 04:02:14 CEST 2013

At Fri, 19 Jul 2013 23:22:02 +0200,
Ing. Luca Lazzeroni - Trend Servizi Srl wrote:

Hi Luca,

> 
> Hi to everybody,
> I'm experiencing problems with sheepdog-0.6 + cache + snapshot.
> The system is 3 node cluster; 2 nodes have cache enabled.
> 
> The problem is this:
> 
> Node A - 2 VM running
> 
> 1) Via a bash script I suspend ad I/O heavy loaded VM
> 2) After suspending that I make a snapshot with "collie vdi snapshot -s xxxxx vdiname"
> 3) I resume di VM
> 4) I copy image outside of cluster via "qemu-img convert -O qcow2 sheepdog:vdiname:1 pippo.qcow2
> 
> Sometime (but not every time) sheeps panic on node A (while on nodes B and C everything continues to work => cache problem) with this backtrace:
> 
> Jul 19 22:56:39 [gway 19609] add_to_lru_cache(660) PANIC: the object already exist
> Jul 19 22:56:39 [gway 19609] crash_handler(180) sheep exits unexpectedly (Aborted).
> Jul 19 22:56:39 [gway 19609] sd_backtrace(833) sheep.c:182: crash_handler
> Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xfbcf) [0x7f9eefa56bcf]
> Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x36) [0x7f9eeefa4036]
> Jul 19 22:56:39 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7f9eeefa7697]
> Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:660: add_to_lru_cache
> Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:710: object_cache_lookup
> Jul 19 22:56:39 [gway 19609] sd_backtrace(833) object_cache.c:1073: object_cache_handle_request
> Jul 19 22:56:39 [gway 19609] sd_backtrace(833) ops.c:1385: do_process_work
> Jul 19 22:56:40 [gway 19609] sd_backtrace(833) work.c:243: worker_routine
> Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8d) [0x7f9eefa4ef8d]
> Jul 19 22:56:40 [gway 19609] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f9eef066e1c]
> Jul 19 22:56:40 [gway 19609] __dump_stack_frames(743) cannot find gdb
> Jul 19 22:56:40 [gway 19609] __sd_dump_variable(693) cannot find gdb
> Jul 19 22:56:40 [main] crash_handler(487) sheep pid 14333 exited unexpectedly.
> 
> I've started experiencing this problem after applying today's PATCH stable-0.6 2/3 => sheep: delete cache objects only when they are succesfully pushed.
> 
> I think the behaviour is due to high I/O load on the vm (while being backed up) and cache flushed by the snapshot, but I'm not sure.

Thanks for your report. This is a very serious problem. I'll look at it.

Thanks,
Hitoshi