[sheepdog-users] gateway crashing if 1 node fail
Liu Yuan
namei.unix at gmail.com
Wed Jul 4 09:31:02 CEST 2012
On 07/04/2012 03:15 PM, Alexandre DERUMIER wrote:
> Hi,
> I'm using a cluster with 3 servers,
> each server with 1 sheepdog daemon gateway only (-g -p 7000), and 1 sheepdog for disk (-p 7001)
>
> server1 :10.6.0.100
> server2 :10.6.0.101
> server3 :10.6.0.102
>
> cluster is formatted with:
> collie cluster format --copies=3
>
>
> I'm launching a fio benchmark from the vm,using gateway 10.6.0.100:7000 then kill the sheep daemon on 10.6.0.102:7001.
>
> then gateway (10.6.0.100:7000) is crashing after failed try to connect 10.6.0.102:7001 .
> Others daemons works fine.
> any idea ?
>
> Jul 04 09:03:38 [gateway 160310] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160313] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160310] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160312] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160313] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160316] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160312] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160315] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160314] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160317] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160316] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160315] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160318] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160314] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160317] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160321] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160319] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160318] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160323] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160320] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160321] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160324] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160319] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160323] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160320] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160322] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160326] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160324] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160327] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160322] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160328] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160326] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160327] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160328] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160329] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160330] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160333] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160334] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160332] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160331] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160335] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160337] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160338] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160336] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [main] crash_handler(408) sheep pid 2148 exited unexpectedly.
>
Hi Alexandre,
Can you find a file named 'core' in your /store directory? If so,
please run 'gdb sheep /store/core' and type 'where' command, then please
paste the output onto the list.
Also, would you enable '-d' option for sheep for more debug output
and attach it to the mailing list?
Thanks,
Yuan
More information about the sheepdog-users
mailing list