[sheepdog-users] Corosync + Sheepdog - wait for a next dispatch event
Fabian Zimmermann
dev.faz at gmail.com
Sun May 25 13:53:59 CEST 2014
Hi,
i'm using version 0.7.5 (but this also happens with 0.8.1).
If I reset one of my nodes hard, my cluster will stop working and block
further IOs.
Here what I currently found out:
-- Node reset
==> syslog <==
May 25 13:43:28 node2 corosync[2982]: [TOTEM ] A new membership
(192.168.20.21:292) was formed. Members left: 1084757015
May 25 13:43:28 node2 corosync[2982]: [QUORUM] Members[4]: 1084757013
1084757014 1084757016 1084757017
May 25 13:43:28 node2 corosync[2982]: [MAIN ] Completed service
synchronization, ready to provide service.
==> /var/lib/sheepdog/sheep.log <==
May 25 13:43:28 DEBUG [main] cdrv_cpg_confchg(553) mem:4, joined:0, left:1
May 25 13:43:28 DEBUG [main] __corosync_dispatch(371) wait for a next
dispatch event
--> IO frozen
-- Node powered on and sheepdog started
May 25 13:47:43 DEBUG [main] cdrv_cpg_confchg(553) mem:5, joined:1, left:0
May 25 13:47:43 DEBUG [main] sd_leave_handler(907) leave IPv4
ip:192.168.20.23 port:7000
May 25 13:47:43 DEBUG [main] sd_leave_handler(909) [0] IPv4
ip:192.168.20.21 port:7000
May 25 13:47:43 DEBUG [main] sd_leave_handler(909) [1] IPv4
ip:192.168.20.25 port:7000
May 25 13:47:43 DEBUG [main] sd_leave_handler(909) [2] IPv4
ip:192.168.20.24 port:7000
May 25 13:47:43 DEBUG [main] sd_leave_handler(909) [3] IPv4
ip:192.168.20.22 port:7000
May 25 13:47:43 DEBUG [main] recalculate_vnodes(625) node 7000 has 64
vnodes, free space 3971959803904
May 25 13:47:43 DEBUG [main] recalculate_vnodes(625) node 7000 has 64
vnodes, free space 3971959803904
May 25 13:47:43 DEBUG [main] recalculate_vnodes(625) node 7000 has 64
vnodes, free space 3971959803904
May 25 13:47:43 DEBUG [main] recalculate_vnodes(625) node 7000 has 64
vnodes, free space 3971959803904
May 25 13:47:43 DEBUG [main] update_epoch_log(26) update epoch: 32, 4
May 25 13:47:43 DEBUG [rw] prepare_object_list(759) 32
May 25 13:47:43 DEBUG [rw] wait_get_vdis_done(593) waiting for vdi list
May 25 13:47:43 DEBUG [rw] wait_get_vdis_done(600) vdi list ready
May 25 13:47:43 DEBUG [main] sockfd_cache_del_node(509)
192.168.20.23:7000, count 4
May 25 13:47:43 DEBUG [rw] fetch_object_list(673) 192.168.20.24:7000
May 25 13:47:43 DEBUG [rw] sockfd_cache_get_long(372)
192.168.20.24:7000, idx 0
May 25 13:47:43 DEBUG [main] cdrv_cpg_deliver(448) 0
May 25 13:47:43 DEBUG [main] sd_join_handler(750) check IPv4
ip:192.168.20.23 port:7000, 1
May 25 13:47:43 DEBUG [main] sd_join_handler(763) 192.168.20.23:7000:
cluster_status = 0x1
May 25 13:47:43 DEBUG [main] cdrv_cpg_deliver(448) 1
May 25 13:47:43 DEBUG [main] sd_accept_handler(886) join IPv4
ip:192.168.20.23 port:7000
May 25 13:47:43 DEBUG [main] sd_accept_handler(888) [0] IPv4
ip:192.168.20.21 port:7000
May 25 13:47:43 DEBUG [main] sd_accept_handler(888) [1] IPv4
ip:192.168.20.25 port:7000
May 25 13:47:43 DEBUG [main] sd_accept_handler(888) [2] IPv4
ip:192.168.20.24 port:7000
May 25 13:47:43 DEBUG [main] sd_accept_handler(888) [3] IPv4
ip:192.168.20.22 port:7000
May 25 13:47:43 DEBUG [main] sd_accept_handler(888) [4] IPv4
ip:192.168.20.23 port:7000
>From my point of view, this looks like the "leave"-event is not handled
until the "join" is triggering another event, isn't it?
Thanks a lot,
Fabian
More information about the sheepdog-users
mailing list