[sheepdog-users] Sheep crash on node failure

Ing. Luca Lazzeroni - Trend Servizi Srl luca at gvnet.it
Thu Jun 27 12:06:01 CEST 2013


Yes, you're right.
I've tried the last git master and it fixes the problem.
Thank you.

Il giorno 27/giu/2013, alle ore 11:53, Liu Yuan <namei.unix at gmail.com> ha scritto:

> On Thu, Jun 27, 2013 at 11:25:44AM +0200, Ing. Luca Lazzeroni - Trend Servizi Srl wrote:
>> Hi,
>> I'm testing for internal purposes sheepdog 0.6.0 on a cluster with 3 nodes.
>> 2 nodes have identical configuration: 1 ssd disk for sheepdog cache and 2 x 7,2K sata drives for data storage assigned to cluster via MD feature.
>> The other machine has 3x 7,2K sata drives, one for cache and the other 2 for datas.
>> 
>> Capacity of data drives is identical: 1TB each.
>> Data drives are assigned to sheepdog via md feature.
>> Cluster is formatted in 2 copies mode.
>> 
>> Everything works fine, except for one think; if a qemu vm is working on a node and I manually kill sheep on one of the others, after a few second the sheep daemon on the node with vm dies with this error:
>> 
>> Jun 27 11:09:10 [main] recover_object_main(625) done:76 count:14456, oid:33a95f0000623c
>> Jun 27 11:09:10 [main] recover_object_main(625) done:77 count:14456, oid:ce73a800000030
>> Jun 27 11:09:10 [gway 20930] gateway_forward_request(308) fail to write local e04cc600000001, No object found
>> Jun 27 11:09:10 [oc_push 20909] push_cache_object(471) failed to push object No object found
>> Jun 27 11:09:10 [oc_push 20909] do_push_object(841) PANIC: push failed but should never fail
>> Jun 27 11:09:10 [oc_push 20909] crash_handler(180) sheep exits unexpectedly (Aborted).
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(833) sheep.c:182: crash_handler
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xfbcf) [0x7fc23fd0dbcf]
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x36) [0x7fc23f25b036]
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7fc23f25e697]
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(833) object_cache.c:841: do_push_object
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(833) work.c:243: worker_routine
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8d) [0x7fc23fd05f8d]
>> Jun 27 11:09:10 [oc_push 20909] sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7fc23f31de1c]
>> Jun 27 11:09:10 [oc_push 20909] __dump_stack_frames(743) cannot find gdb
>> Jun 27 11:09:10 [oc_push 20909] __sd_dump_variable(693) cannot find gdb
>> Jun 27 11:09:10 [main] crash_handler(487) sheep pid 20658 exited unexpectedly.
>> 
>> If I manually restart sheep on the crashed node, recovery starts again and I must wait until cluster fully recovered to start vm on that node.
>> 
>> Crash is repeatable. 
>> 
>> Note that without cache nothing wrong happens; the vm works without problems.
>> 
>> Any idea ? Wrong setup ?
> 
> The problem is already fixed in the latest git master. Could you please try
> master branch and verify the fix?
> 
> Thanks,
> Yuan

Ing. Luca Lazzeroni - Trend Servizi Srl
Responsabile R&D
http://www.trendservizi.it







More information about the sheepdog-users mailing list