Hi, This is followed by our last test. One sheep node in our cluster connected zookeeper timeout so we tried to restart sheep on the node. However, the sheep cannot be started successfully, I am not sure if zk connection timeout could somehow causes recovering from journal file failed? Is this a bug of journal replay? best, Hongyi when the sheep starting is failed, the sheep.log shows: ==================================================== May 08 06:03:03 [gway 13474] do_read(281) connection is closed (48 bytes left) May 08 06:03:03 [gway 13474] exec_req(405) failed to read a response May 08 06:03:45 [main] zk_create_seq_node(189) PANIC: failed, path:/sheepdog/queue/, invalid zhandle state May 08 06:03:45 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:03:45 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:03:45 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:189: zk_create_seq_node May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:429: add_event May 08 06:03:45 [main] sd_backtrace(833) group.c:339: queue_cluster_request May 08 06:03:45 [main] sd_backtrace(833) request.c:613: finish_rx May 08 06:03:45 [main] sd_backtrace(833) event.c:209: do_event_loop May 08 06:03:45 [main] sd_backtrace(833) sheep.c:791: main May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:03:45 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:03:45 [main] __dump_stack_frames(743) cannot find gdb May 08 06:03:45 [main] __sd_dump_variable(693) cannot find gdb May 08 06:03:48 [main] crash_handler(487) sheep pid 8084 exited unexpectedly. May 08 06:06:11 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:06:11 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:06:11 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:06:12 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:06:12 [main] replay_journal_entry(183) open No such file or directory May 08 06:06:12 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:06:12 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:06:12 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:06:12 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:06:12 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:06:12 [main] sd_backtrace(833) sheep.c:740: main May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:06:12 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:06:12 [main] __dump_stack_frames(743) cannot find gdb May 08 06:06:12 [main] __sd_dump_variable(693) cannot find gdb May 08 06:06:12 [main] crash_handler(487) sheep pid 19138 exited unexpectedly. May 08 06:06:15 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:06:15 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:06:15 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:06:15 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:06:15 [main] replay_journal_entry(183) open No such file or directory May 08 06:06:15 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:06:15 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:06:15 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:06:15 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:03:03 [gway 13474] do_read(281) connection is closed (48 bytes left) May 08 06:03:03 [gway 13474] exec_req(405) failed to read a response May 08 06:03:45 [main] zk_create_seq_node(189) PANIC: failed, path:/sheepdog/queue/, invalid zhandle state May 08 06:03:45 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:03:45 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:03:45 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:189: zk_create_seq_node May 08 06:03:45 [main] sd_backtrace(833) zookeeper.c:429: add_event May 08 06:03:45 [main] sd_backtrace(833) group.c:339: queue_cluster_request May 08 06:03:45 [main] sd_backtrace(833) request.c:613: finish_rx May 08 06:03:45 [main] sd_backtrace(833) event.c:209: do_event_loop May 08 06:03:45 [main] sd_backtrace(833) sheep.c:791: main May 08 06:03:45 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:03:45 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:03:45 [main] __dump_stack_frames(743) cannot find gdb May 08 06:03:45 [main] __sd_dump_variable(693) cannot find gdb May 08 06:03:48 [main] crash_handler(487) sheep pid 8084 exited unexpectedly. May 08 06:06:11 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:06:11 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:06:11 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:06:12 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:06:12 [main] replay_journal_entry(183) open No such file or directory May 08 06:06:12 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:06:12 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:06:12 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:06:12 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:06:12 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:06:12 [main] sd_backtrace(833) sheep.c:740: main May 08 06:06:12 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:06:12 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:06:12 [main] __dump_stack_frames(743) cannot find gdb May 08 06:06:12 [main] __sd_dump_variable(693) cannot find gdb May 08 06:06:12 [main] crash_handler(487) sheep pid 19138 exited unexpectedly. May 08 06:06:15 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:06:15 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:06:15 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:06:15 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:06:15 [main] replay_journal_entry(183) open No such file or directory May 08 06:06:15 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:06:15 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:06:15 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:06:15 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:06:15 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:06:46 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:06:46 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:06:46 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:06:46 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:06:46 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:06:46 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:06:46 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:06:46 [main] sd_backtrace(833) sheep.c:740: main May 08 06:06:46 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:06:46 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:06:46 [main] __dump_stack_frames(743) cannot find gdb May 08 06:06:46 [main] __sd_dump_variable(693) cannot find gdb May 08 06:06:47 [main] crash_handler(487) sheep pid 19315 exited unexpectedly. May 08 06:07:02 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:07:02 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:07:02 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:07:03 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:07:03 [main] replay_journal_entry(183) open No such file or directory May 08 06:07:03 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:07:03 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:07:03 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:07:03 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:07:03 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:07:03 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:07:03 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:07:03 [main] sd_backtrace(833) sheep.c:740: main May 08 06:07:03 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:07:03 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:07:03 [main] __dump_stack_frames(743) cannot find gdb May 08 06:07:03 [main] __sd_dump_variable(693) cannot find gdb May 08 06:07:03 [main] crash_handler(487) sheep pid 19360 exited unexpectedly. May 08 06:07:06 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:07:06 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:07:06 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:07:07 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:07:07 [main] replay_journal_entry(183) open No such file or directory May 08 06:07:07 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:07:07 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:07:07 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:07:07 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:07:07 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:07:07 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:07:07 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:07:07 [main] sd_backtrace(833) sheep.c:740: main May 08 06:07:07 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:07:07 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:07:07 [main] __dump_stack_frames(743) cannot find gdb May 08 06:07:07 [main] __sd_dump_variable(693) cannot find gdb May 08 06:07:07 [main] crash_handler(487) sheep pid 19403 exited unexpectedly. May 08 06:07:10 [main] md_add_disk(161) /sheep/disk1, nr 1 May 08 06:07:10 [main] md_add_disk(161) /sheep/disk2, nr 2 May 08 06:07:10 [main] send_join_request(1100) IPv4 ip:10.0.0.13 port:7000 May 08 06:07:11 [main] journal_get_path(148) /sheep/disk1/001d5fbd00005978, size 1548288, off 618496, 0 May 08 06:07:11 [main] replay_journal_entry(183) open No such file or directory May 08 06:07:11 [main] check_recover_journal_file(280) PANIC: recoverying from journal file (new) failed May 08 06:07:11 [main] crash_handler(181) sheep exits unexpectedly (Aborted). May 08 06:07:11 [main] sd_backtrace(833) sheep.c:183: crash_handler May 08 06:07:11 [main] sd_backtrace(847) /lib64/libpthread.so.0() [0x327420f4ff] May 08 06:07:11 [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x3273e328a4] May 08 06:07:11 [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x3273e34084] May 08 06:07:11 [main] sd_backtrace(833) journal.c:280: check_recover_journal_file May 08 06:07:11 [main] sd_backtrace(833) sheep.c:740: main May 08 06:07:11 [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x3273e1ecdc] May 08 06:07:11 [main] sd_backtrace(847) sheep() [0x403fe8] May 08 06:07:11 [main] __dump_stack_frames(743) cannot find gdb May 08 06:07:11 [main] __sd_dump_variable(693) cannot find gdb May 08 06:07:11 [main] crash_handler(487) sheep pid 19442 exited unexpectedly. ============================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20130523/89ae047e/attachment-0001.html> |