[sheepdog] zookeeper quitting unexpectedly causes recovering from journal file failed

Hongyi Wang hongyi at zelin.io
Thu May 23 07:29:41 CEST 2013


Hi,

This is followed by our last test. One sheep node in our cluster connected
zookeeper timeout so we tried to restart sheep on the node. However, the
sheep cannot be started successfully,
I am not sure if zk connection timeout could somehow causes recovering from
journal file failed? Is this a bug of journal replay?

best,

Hongyi

when the sheep starting is failed, the sheep.log shows:
====================================================
May 08 06:03:03 [gway 13474] do_read(281) connection is closed (48 bytes
left)
May 08 06:03:03 [gway 13474] exec_req(405) failed to read a response
May 08 06:03:45 [main] zk_create_seq_node(189) PANIC: failed,
path:/sheepdog/queue/, invalid zhandle state
May 08 06:03:45 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:03:45 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:189: zk_create_seq_node
May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:429: add_event
May 08 06:03:45 [main] sd_backtrace(833) group.c:339: queue_cluster_request
May 08 06:03:45 [main] sd_backtrace(833) request.c:613: finish_rx
May 08 06:03:45 [main] sd_backtrace(833) event.c:209: do_event_loop
May 08 06:03:45 [main] sd_backtrace(833) sheep.c:791: main
May 08 06:03:45 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:03:45 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:03:45 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:03:45 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:03:48 [main] crash_handler(487) sheep pid 8084 exited
unexpectedly.
May 08 06:06:11 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:06:11 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:06:11 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:06:12 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:06:12 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:06:12 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:06:12 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:06:12 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:06:12 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:06:12 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:06:12 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:06:12 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:06:12 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:06:12 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:06:12 [main] crash_handler(487) sheep pid 19138 exited
unexpectedly.
May 08 06:06:15 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:06:15 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:06:15 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:06:15 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:06:15 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:06:15 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:06:15 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:06:15 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:03:03 [gway 13474] do_read(281) connection is closed (48 bytes
left)
May 08 06:03:03 [gway 13474] exec_req(405) failed to read a response
May 08 06:03:45 [main] zk_create_seq_node(189) PANIC: failed,
path:/sheepdog/queue/, invalid zhandle state
May 08 06:03:45 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:03:45 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:189: zk_create_seq_node
May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:429: add_event
May 08 06:03:45 [main] sd_backtrace(833) group.c:339: queue_cluster_request
May 08 06:03:45 [main] sd_backtrace(833) request.c:613: finish_rx
May 08 06:03:45 [main] sd_backtrace(833) event.c:209: do_event_loop
May 08 06:03:45 [main] sd_backtrace(833) sheep.c:791: main
May 08 06:03:45 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:03:45 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:03:45 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:03:45 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:03:48 [main] crash_handler(487) sheep pid 8084 exited
unexpectedly.
May 08 06:06:11 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:06:11 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:06:11 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:06:12 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:06:12 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:06:12 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:06:12 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:06:12 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:06:12 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:06:12 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:06:12 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:06:12 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:06:12 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:06:12 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:06:12 [main] crash_handler(487) sheep pid 19138 exited
unexpectedly.
May 08 06:06:15 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:06:15 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:06:15 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:06:15 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:06:15 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:06:15 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:06:15 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:06:15 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:06:46 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:06:46 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:06:46 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:06:46 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:06:46 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:06:46 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:06:46 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:06:46 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:06:46 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:06:46 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:06:46 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:06:46 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:06:47 [main] crash_handler(487) sheep pid 19315 exited
unexpectedly.
May 08 06:07:02 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:07:02 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:07:02 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:07:03 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:07:03 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:07:03 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:07:03 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:07:03 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:07:03 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:07:03 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:07:03 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:07:03 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:07:03 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:07:03 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:07:03 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:07:03 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:07:03 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:07:03 [main] crash_handler(487) sheep pid 19360 exited
unexpectedly.
May 08 06:07:06 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:07:06 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:07:06 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:07:07 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:07:07 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:07:07 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:07:07 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:07:07 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:07:07 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:07:07 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:07:07 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:07:07 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:07:07 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:07:07 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:07:07 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:07:07 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:07:07 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:07:07 [main] crash_handler(487) sheep pid 19403 exited
unexpectedly.
May 08 06:07:10 [main] md_add_disk(161) /sheep/disk1, nr 1
May 08 06:07:10 [main] md_add_disk(161) /sheep/disk2, nr 2
May 08 06:07:10 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000
May 08 06:07:11 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978,
size 1548288, off 618496, 0
May 08 06:07:11 [main] replay_journal_entry(183) open No such file or
directory
May 08 06:07:11 [main] check_recover_journal_file(280) PANIC: recoverying
from journal file (new) failed
May 08 06:07:11 [main] crash_handler(181) sheep exits unexpectedly
(Aborted).
May 08 06:07:11 [main] sd_backtrace(833) sheep.c:183: crash_handler
May 08 06:07:11 [main] sd_backtrace(847) /lib64/libpthread.so.0()
[0x327420f4ff]
May 08 06:07:11 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34)
[0x3273e328a4]
May 08 06:07:11 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174)
[0x3273e34084]
May 08 06:07:11 [main] sd_backtrace(833) journal.c:280:
check_recover_journal_file
May 08 06:07:11 [main] sd_backtrace(833) sheep.c:740: main
May 08 06:07:11 [main] sd_backtrace(847)
/lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc]
May 08 06:07:11 [main] sd_backtrace(847) sheep() [0x403fe8]
May 08 06:07:11 [main] __dump_stack_frames(743) cannot find gdb
May 08 06:07:11 [main] __sd_dump_variable(693) cannot find gdb
May 08 06:07:11 [main] crash_handler(487) sheep pid 19442 exited
unexpectedly.
=============================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20130523/89ae047e/attachment-0003.html>


More information about the sheepdog mailing list