[sheepdog-users] sheepdog with corosync crashed

Hitoshi Mitake mitake.hitoshi at gmail.com
Tue Jun 23 14:31:52 CEST 2015


At Tue, 23 Jun 2015 13:25:08 +0300,
Vasiliy Tolstov wrote:
> 
> I'm testing sheepdog cluster (format with -c 8:3) nodes 2 (i'm expect
> to add more nodes today)

You can't use ec cluster without enough number of nodes. You need x+y
nodes for x:y.

Below error is caused by corosync driver. sheepdog cannot handle
network partition because virtual disks must not be inconsistent
(other types of storage, e.g. kvs, can handle partition correctly). So
the driver caused the panic and stopped sheep intentionally.

Thanks,
Hitoshi

> dog cluster info
> Cluster status: running, auto-recovery enabled
> 
> Cluster created at Tue Jun 23 11:38:55 2015
> 
> Epoch Time           Version
> 2015-06-23 13:18:39      6 [192.168.240.132:7000, 192.168.240.133:7000]
> 2015-06-23 13:01:39      5 [192.168.240.133:7000]
> 2015-06-23 12:55:32      4 [192.168.240.132:7000, 192.168.240.133:7000]
> 2015-06-23 12:29:07      3 [192.168.240.133:7000]
> 2015-06-23 11:45:29      2 [192.168.240.132:7000, 192.168.240.133:7000]
> 2015-06-23 11:38:55      1 [192.168.240.133:7000]
> 
> 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> queue_request(486) READ_DEL_VDIS, 1
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [io
> 18013] do_process_work(1938) c9, 0, 4
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> client_handler(974) 4, 0
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> tx_main(887) 20, 127.0.0.1:56353
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> client_handler(974) 19, 0
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> clear_client_info(915) connection seems to be dead
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> clear_client_info(925) refcnt:0, fd:20, 127.0.0.1:56353
> Jun 23 12:55:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> destroy_client(906) connection from: 127.0.0.1:56353
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cdrv_cpg_deliver(431) 2
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> sd_leave_handler(1160) leave IPv4 ip:192.168.240.132 port:7000
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> sd_leave_handler(1162) IPv4 ip:192.168.240.133 port:7000
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> recalculate_vnodes(127) node IPv4 ip:192.168.240.133 port:7000 has 128
> vnodes, free space 2707208527872
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> update_epoch_log(26) update epoch: 5, 1
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> for_each_object_in_wd(453) Create 6 threads for all path
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] prepare_object_list(1055) 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] wait_get_vdis_done(802) waiting for vdi list
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] wait_get_vdis_done(809) vdi list ready
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> sockfd_cache_del_node(475) 192.168.240.132:7000, count 1
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] fetch_object_list(974) 192.168.240.133:7000
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] sockfd_cache_get_long(338) 192.168.240.133:7000, idx 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> client_handler(974) 1, 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> rx_main(835) 17, 192.168.240.133:46965
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> queue_request(486) GET_OBJ_LIST, 1
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [io
> 18070] do_process_work(1938) a1, 0, 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> client_handler(974) 4, 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> tx_main(887) 17, 192.168.240.133:46965
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] sockfd_cache_put_long(372) 192.168.240.133:7000 idx 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] fetch_object_list(999) 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [rw
> 18071] prepare_object_list(1084) 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> finish_recovery(777) recovery complete: new epoch 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> queue_request(486) COMPLETE_RECOVERY, 1
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> queue_cluster_request(316) COMPLETE_RECOVERY (0x3462d50)
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cdrv_cpg_deliver(431) 3
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> sd_notify_handler(930) op COMPLETE_RECOVERY, size: 176, from: IPv4
> ip:192.168.240.133 port:7000
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cluster_recovery_completion(702) new epoch 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cluster_recovery_completion(710) IPv4 ip:192.168.240.133 port:7000 is
> recovered at epoch 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cluster_recovery_completion(712) [0] IPv4 ip:192.168.240.133 port:7000
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]: NOTICE [main]
> cluster_recovery_completion(726) all nodes are recovered, epoch 5
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> purge_work_done(397) purging work done, number of units: 0
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]: message
> repeated 5 times: [  DEBUG [main] purge_work_done(397) purging work
> done, number of units: 0]
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> cdrv_cpg_confchg(537) mem:1, joined:0, left:1
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> cdrv_cpg_confchg(550) PANIC: a number of leaving node (1) is larger
> than majority (1), network partition
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> crash_handler(268) sheep exits unexpectedly (Aborted).
> Jun 23 13:01:39 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x406af7]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /lib/x86_64-linux-gnu/libpthread.so.0(+0xf09f)
> [0x7f876635e09f]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34)
> [0x7f87654e8164]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f)
> [0x7f87654eb3df]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x429930]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x43e960]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x428a19]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x4303e8]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x4061b4]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847)
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfc)
> [0x7f87654d4eac]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  EMERG [main]
> sd_backtrace(847) /usr/sbin/sheep() [0x406928]
> Jun 23 13:01:40 cn33.z1.mn2.simplecloud.ru sheep[18010]:  DEBUG [main]
> gdb_cmd(767) cannot find gdb
> 
> -- 
> Vasiliy Tolstov,
> e-mail: v.tolstov at selfip.ru
> -- 
> sheepdog-users mailing lists
> sheepdog-users at lists.wpkg.org
> https://lists.wpkg.org/mailman/listinfo/sheepdog-users


More information about the sheepdog-users mailing list