[sheepdog] Network partition issue in sheepdog cluster

tuji tuji at atworks.co.jp
Fri Jul 25 03:45:06 CEST 2014


Hi

I found same problem upder high cpu load.

I have solved my problem by specify  '-r' option for corosync
'-r' is new option for corosync 2.x.

If you are useing corosync 2.x, try -r option.



> Hi, All
>       In the past few days, I tried to create a sheepdog cluster and finally created one sheepdog cluster. However the sheepdog daemon will stop in some node  due to the network partition issue. Does anyone encountered the same issue? 
>       Look forward to your help and thank you very much.
> 
> 
>       My cluster has 5 nodes which OS are CentOS6.5 and the sheepdog's version is 0.8.2.
>       The error log in the sheep.log file is as follows:
> 
> 
> Jul 04 14:00:32  EMERG [main] cdrv_cpg_confchg(573) PANIC: Network partition is detected
> Jul 04 14:00:32  EMERG [main] crash_handler(267) sheep exits unexpectedly (Aborted).
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) sheep.c:269: crash_handler
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libpthread.so.0(+0xf70f) [0x7f29f7e8970f]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x7f29f748b924]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x7f29f748d104]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) corosync.c:573: cdrv_cpg_confchg
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /usr/lib64/libcpg.so.4(cpg_dispatch+0x451) [0x7f29f79f2d51]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) corosync.c:703: corosync_handler
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) event.c:210: do_event_loop
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) sheep.c:931: main
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x7f29f7477d1c]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) sheep() [0x403e68]
> 
> 
> 
> 
> Best Regards
> Lifeng
> Hi, All
>       In the past few days, I tried to create a sheepdog cluster and finally created one sheepdog cluster. However the sheepdog daemon will stop in some node  due to the network partition issue. Does anyone encountered the same issue? 
>       Look forward to your help and thank you very much.
> 
> 
>       My cluster has 5 nodes which OS are CentOS6.5 and the sheepdog's version is 0.8.2.
>       The error log in the sheep.log file is as follows:
> 
> 
> Jul 04 14:00:32  EMERG [main] cdrv_cpg_confchg(573) PANIC: Network partition is detected
> Jul 04 14:00:32  EMERG [main] crash_handler(267) sheep exits unexpectedly (Aborted).
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) sheep.c:269: crash_handler
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libpthread.so.0(+0xf70f) [0x7f29f7e8970f]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x7f29f748b924]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x7f29f748d104]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) corosync.c:573: cdrv_cpg_confchg
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /usr/lib64/libcpg.so.4(cpg_dispatch+0x451) [0x7f29f79f2d51]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) corosync.c:703: corosync_handler
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) event.c:210: do_event_loop
> Jul 04 14:00:32  EMERG [main] sd_backtrace(833) sheep.c:931: main
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) /lib64/libc.so.6(__libc_start_main+0xfc) [0x7f29f7477d1c]
> Jul 04 14:00:32  EMERG [main] sd_backtrace(847) sheep() [0x403e68]
> 
> 
> 
> 
> Best Regards
> Lifeng

--------------------------
Masahiro Tsuji

A.T.WORKS, INC
URL http://www.atworks.co.jp




More information about the sheepdog mailing list