2013/8/8 Jan Friesse <jfriesse at redhat.com>: > Few retransmits of packets is pretty normal because of UDP. I've been cutting the full list of messages, they were 2499 (see below) > - how often you get this messages? Months ago I got this problem with an earlier version on corosync (1.4.3 shipped with debian). I have ot search the mail archive to find the right date, but I guess it doesn't matter much. Sure it's not something regular. As you can see here, I got a single message in syslog.7.gz (1 august) root at sheepdog001:~# zgrep -c '\[TOTEM \] Retransmit List' /var/log/syslog.* /var/log/syslog.1:2499 /var/log/syslog.2.gz:0 /var/log/syslog.3.gz:0 /var/log/syslog.4.gz:0 /var/log/syslog.5.gz:0 /var/log/syslog.6.gz:0 /var/log/syslog.7.gz:1 > - Every time node starts? I just restated the cluster. I got no '[TOTEM ] Retransmit List' messages in syslog of any node. > - After ~two minutes of running? No. > - Isn't there any big IO/CPU load causing corosync to not to be > scheduled properly? Like every day the cluster receive lot's of data on a guest named 'backup'. I don't think anything different from the other day happened, except the call trace message, but that was way earlier the crash. I can't tell you if one of the two runing guests went crazy stealing too much resources Yuan, tell me please if you see anything to worry about in the sheep.log after the restart sheepdog001 (the first one started) Aug 08 10:29:56 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1 Aug 08 10:29:56 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2 Aug 08 10:29:57 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3 Aug 08 10:29:57 [main] send_join_request(1095) IPv4 ip:192.168.6.41 port:7000 Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale Aug 08 10:29:57 [main] init_vdi_state(195) failed to read inode header 800e4aa600000000 0 Aug 08 10:29:57 [main] init_vdi_state(195) failed to read inode header 80c8d12e00000000 0 Aug 08 10:30:00 [main] init_vdi_state(195) failed to read inode header 80c8d13700000000 0 Aug 08 10:30:01 [main] init_vdi_state(195) failed to read inode header 80f131b700000000 0 Aug 08 10:30:01 [main] init_vdi_state(195) failed to read inode header 80c8d13e00000000 0 Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header 80c8d13600000000 0 Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header 80c8d12800000000 0 Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header 80c8d14400000000 0 Aug 08 10:30:02 [main] check_host_env(405) Allowed core file size 0, suggested unlimited Aug 08 10:30:02 [main] main(790) sheepdog daemon (version 0.6.0_62_gdff7a77) started Aug 08 10:30:02 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 0 Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1 Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2 Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3 Aug 08 10:30:43 [main] send_join_request(1095) IPv4 ip:192.168.6.41 port:7000 Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 800e4aa600000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d12e00000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d13700000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80f131b700000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d13e00000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d13600000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d12800000000 0 Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header 80c8d14400000000 0 Aug 08 10:30:43 [main] check_host_env(405) Allowed core file size 0, suggested unlimited Aug 08 10:30:43 [main] main(790) sheepdog daemon (version 0.6.0_62_gdff7a77) started Aug 08 10:30:43 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 0 Aug 08 10:31:20 [main] sd_check_join_cb(1055) 192.168.6.42:7000: ret = 0x0, cluster_status = 0x4 Aug 08 10:31:20 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 1 Aug 08 11:00:59 [main] sd_check_join_cb(1055) 192.168.6.43:7000: ret = 0x0, cluster_status = 0x4 Aug 08 11:00:59 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 1 Aug 08 11:01:14 [main] sd_check_join_cb(1055) 192.168.6.44:7000: ret = 0x0, cluster_status = 0x1 Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1, finished: 1 sheepdog002 Aug 07 21:09:54 [main] cdrv_cpg_confchg(602) PANIC: Network partition is detected Aug 07 21:09:54 [main] crash_handler(181) sheep exits unexpectedly (Aborted). Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:183: crash_handler Aug 07 21:09:54 [main] sd_backtrace(848) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7fd1e9afd02f] Aug 07 21:09:54 [main] sd_backtrace(848) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7fd1e9109474] Aug 07 21:09:54 [main] sd_backtrace(848) /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7fd1e910c6ef] Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:602: cdrv_cpg_confchg Aug 07 21:09:54 [main] sd_backtrace(848) /usr/lib/libcpg.so.4(cpg_dispatch+0x594) [0x7fd1e9668d74] Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:744: corosync_handler Aug 07 21:09:54 [main] sd_backtrace(834) event.c:209: do_event_loop Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:795: main Aug 07 21:09:54 [main] sd_backtrace(848) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc) [0x7fd1e90f5eac] Aug 07 21:09:54 [main] sd_backtrace(848) sheep() [0x405498] Aug 07 21:09:54 [main] __dump_stack_frames(744) cannot find gdb Aug 07 21:09:54 [main] __sd_dump_variable(694) cannot find gdb Aug 07 21:09:54 [main] crash_handler(487) sheep pid 20833 exited unexpectedly. Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1 Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2 Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3 Aug 08 10:31:20 [main] send_join_request(1095) IPv4 ip:192.168.6.42 port:7000 Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale Aug 08 10:31:21 [main] check_host_env(405) Allowed core file size 0, suggested unlimited Aug 08 10:31:21 [main] main(790) sheepdog daemon (version 0.6.0_62_gdff7a77) started Aug 08 10:31:21 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 0 Aug 08 11:00:59 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 1 Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1, finished: 1 sheepdog003 Aug 08 11:00:59 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1 Aug 08 11:00:59 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2 Aug 08 11:00:59 [main] send_join_request(1095) IPv4 ip:192.168.6.43 port:7000 Aug 08 11:00:59 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale Aug 08 11:00:59 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale Aug 08 11:01:02 [main] init_vdi_state(195) failed to read inode header 80c8d12c00000000 0 Aug 08 11:01:02 [main] init_vdi_state(195) failed to read inode header 80c8d13900000000 0 Aug 08 11:01:02 [main] check_host_env(405) Allowed core file size 0, suggested unlimited Aug 08 11:01:02 [main] main(790) sheepdog daemon (version 0.6.0_62_gdff7a77) started Aug 08 11:01:02 [main] update_cluster_info(871) status = 4, epoch = 1, finished: 0 Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1, finished: 1 sheepdog004 Aug 08 11:01:14 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 1 Aug 08 11:01:14 [main] md_add_disk(161) /mnt/sheep/dsk04, nr 2 Aug 08 11:01:14 [main] send_join_request(1095) IPv4 ip:192.168.6.44 port:7000 Aug 08 11:01:14 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale Aug 08 11:01:14 [main] for_each_object_in_stale(403) /mnt/sheep/dsk04/.stale Aug 08 11:01:16 [main] check_host_env(405) Allowed core file size 0, suggested unlimited Aug 08 11:01:16 [main] main(790) sheepdog daemon (version 0.6.0_62_gdff7a77) started Aug 08 11:01:16 [main] update_cluster_info(871) status = 1, epoch = 1, finished: 0 Aug 08 11:01:20 [gway 20432] sheep_exec_req(548) failed No object found Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found |