[sheepdog-users] [corosync] Single disk getting full

Valerio Pachera sirio81 at gmail.com
Thu Aug 8 11:54:29 CEST 2013


2013/8/8 Jan Friesse <jfriesse at redhat.com>:
> Few retransmits of packets is pretty normal because of UDP.

I've been cutting the full list of messages, they were 2499 (see below)

> - how often you get this messages?

Months ago I got this problem with an earlier version on corosync
(1.4.3 shipped with debian).
I have ot search the mail archive to find the right date, but I guess
it doesn't matter much.
Sure it's not something regular.
As you can see here, I got a single message in syslog.7.gz (1 august)

root at sheepdog001:~# zgrep -c '\[TOTEM \] Retransmit List' /var/log/syslog.*
/var/log/syslog.1:2499
/var/log/syslog.2.gz:0
/var/log/syslog.3.gz:0
/var/log/syslog.4.gz:0
/var/log/syslog.5.gz:0
/var/log/syslog.6.gz:0
/var/log/syslog.7.gz:1


>   - Every time node starts?

I just restated the cluster.
I got no '[TOTEM ] Retransmit List' messages in syslog of any node.

>   - After ~two minutes of running?

No.

> - Isn't there any big IO/CPU load causing corosync to not to be
> scheduled properly?

Like every day the cluster receive lot's of data on a guest named 'backup'.
I don't think anything different from the other day happened, except
the call trace message, but that was way earlier the crash.
I can't tell you if one of the two runing guests went crazy stealing
too much resources

Yuan, tell me please if you see anything to worry about in the
sheep.log after the restart

sheepdog001 (the first one started)
Aug 08 10:29:56 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1
Aug 08 10:29:56 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2
Aug 08 10:29:57 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3
Aug 08 10:29:57 [main] send_join_request(1095) IPv4 ip:192.168.6.41 port:7000
Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale
Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale
Aug 08 10:29:57 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale
Aug 08 10:29:57 [main] init_vdi_state(195) failed to read inode header
800e4aa600000000 0
Aug 08 10:29:57 [main] init_vdi_state(195) failed to read inode header
80c8d12e00000000 0
Aug 08 10:30:00 [main] init_vdi_state(195) failed to read inode header
80c8d13700000000 0
Aug 08 10:30:01 [main] init_vdi_state(195) failed to read inode header
80f131b700000000 0
Aug 08 10:30:01 [main] init_vdi_state(195) failed to read inode header
80c8d13e00000000 0
Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header
80c8d13600000000 0
Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header
80c8d12800000000 0
Aug 08 10:30:02 [main] init_vdi_state(195) failed to read inode header
80c8d14400000000 0
Aug 08 10:30:02 [main] check_host_env(405) Allowed core file size 0,
suggested unlimited
Aug 08 10:30:02 [main] main(790) sheepdog daemon (version
0.6.0_62_gdff7a77) started
Aug 08 10:30:02 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 0
Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1
Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2
Aug 08 10:30:43 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3
Aug 08 10:30:43 [main] send_join_request(1095) IPv4 ip:192.168.6.41 port:7000
Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale
Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale
Aug 08 10:30:43 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
800e4aa600000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d12e00000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d13700000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80f131b700000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d13e00000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d13600000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d12800000000 0
Aug 08 10:30:43 [main] init_vdi_state(195) failed to read inode header
80c8d14400000000 0
Aug 08 10:30:43 [main] check_host_env(405) Allowed core file size 0,
suggested unlimited
Aug 08 10:30:43 [main] main(790) sheepdog daemon (version
0.6.0_62_gdff7a77) started
Aug 08 10:30:43 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 0
Aug 08 10:31:20 [main] sd_check_join_cb(1055) 192.168.6.42:7000: ret =
0x0, cluster_status = 0x4
Aug 08 10:31:20 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 1
Aug 08 11:00:59 [main] sd_check_join_cb(1055) 192.168.6.43:7000: ret =
0x0, cluster_status = 0x4
Aug 08 11:00:59 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 1
Aug 08 11:01:14 [main] sd_check_join_cb(1055) 192.168.6.44:7000: ret =
0x0, cluster_status = 0x1
Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1,
finished: 1

sheepdog002
Aug 07 21:09:54 [main] cdrv_cpg_confchg(602) PANIC: Network partition
is detected
Aug 07 21:09:54 [main] crash_handler(181) sheep exits unexpectedly (Aborted).
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:183: crash_handler
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7fd1e9afd02f]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7fd1e9109474]
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7fd1e910c6ef]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:602: cdrv_cpg_confchg
Aug 07 21:09:54 [main] sd_backtrace(848)
/usr/lib/libcpg.so.4(cpg_dispatch+0x594) [0x7fd1e9668d74]
Aug 07 21:09:54 [main] sd_backtrace(834) corosync.c:744: corosync_handler
Aug 07 21:09:54 [main] sd_backtrace(834) event.c:209: do_event_loop
Aug 07 21:09:54 [main] sd_backtrace(834) sheep.c:795: main
Aug 07 21:09:54 [main] sd_backtrace(848)
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc)
[0x7fd1e90f5eac]
Aug 07 21:09:54 [main] sd_backtrace(848) sheep() [0x405498]
Aug 07 21:09:54 [main] __dump_stack_frames(744) cannot find gdb
Aug 07 21:09:54 [main] __sd_dump_variable(694) cannot find gdb
Aug 07 21:09:54 [main] crash_handler(487) sheep pid 20833 exited unexpectedly.
Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1
Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2
Aug 08 10:31:20 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 3
Aug 08 10:31:20 [main] send_join_request(1095) IPv4 ip:192.168.6.42 port:7000
Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale
Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale
Aug 08 10:31:20 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale
Aug 08 10:31:21 [main] check_host_env(405) Allowed core file size 0,
suggested unlimited
Aug 08 10:31:21 [main] main(790) sheepdog daemon (version
0.6.0_62_gdff7a77) started
Aug 08 10:31:21 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 0
Aug 08 11:00:59 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 1
Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1,
finished: 1


sheepdog003
Aug 08 11:00:59 [main] md_add_disk(161) /mnt/sheep/dsk01/obj, nr 1
Aug 08 11:00:59 [main] md_add_disk(161) /mnt/sheep/dsk02, nr 2
Aug 08 11:00:59 [main] send_join_request(1095) IPv4 ip:192.168.6.43 port:7000
Aug 08 11:00:59 [main] for_each_object_in_stale(403) /mnt/sheep/dsk01/obj/.stale
Aug 08 11:00:59 [main] for_each_object_in_stale(403) /mnt/sheep/dsk02/.stale
Aug 08 11:01:02 [main] init_vdi_state(195) failed to read inode header
80c8d12c00000000 0
Aug 08 11:01:02 [main] init_vdi_state(195) failed to read inode header
80c8d13900000000 0
Aug 08 11:01:02 [main] check_host_env(405) Allowed core file size 0,
suggested unlimited
Aug 08 11:01:02 [main] main(790) sheepdog daemon (version
0.6.0_62_gdff7a77) started
Aug 08 11:01:02 [main] update_cluster_info(871) status = 4, epoch = 1,
finished: 0
Aug 08 11:01:14 [main] update_cluster_info(871) status = 1, epoch = 1,
finished: 1


sheepdog004
Aug 08 11:01:14 [main] md_add_disk(161) /mnt/sheep/dsk03, nr 1
Aug 08 11:01:14 [main] md_add_disk(161) /mnt/sheep/dsk04, nr 2
Aug 08 11:01:14 [main] send_join_request(1095) IPv4 ip:192.168.6.44 port:7000
Aug 08 11:01:14 [main] for_each_object_in_stale(403) /mnt/sheep/dsk03/.stale
Aug 08 11:01:14 [main] for_each_object_in_stale(403) /mnt/sheep/dsk04/.stale
Aug 08 11:01:16 [main] check_host_env(405) Allowed core file size 0,
suggested unlimited
Aug 08 11:01:16 [main] main(790) sheepdog daemon (version
0.6.0_62_gdff7a77) started
Aug 08 11:01:16 [main] update_cluster_info(871) status = 1, epoch = 1,
finished: 0
Aug 08 11:01:20 [gway 20432] sheep_exec_req(548) failed No object found
Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found
Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found
Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found
Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found
Aug 08 11:01:21 [gway 20432] sheep_exec_req(548) failed No object found



More information about the sheepdog-users mailing list