[sheepdog-users] gateway crashing if 1 node fail

Liu Yuan namei.unix at gmail.com
Wed Jul 4 09:31:02 CEST 2012


On 07/04/2012 03:15 PM, Alexandre DERUMIER wrote:
> Hi,
> I'm using a cluster with 3 servers,
> each server with 1 sheepdog daemon gateway only (-g -p 7000), and 1 sheepdog for disk (-p 7001)
> 
> server1 :10.6.0.100
> server2 :10.6.0.101
> server3 :10.6.0.102
> 
> cluster is formatted with:
> collie cluster format --copies=3
> 
> 
> I'm launching a fio benchmark from the vm,using gateway 10.6.0.100:7000 then kill the sheep daemon on 10.6.0.102:7001.
> 
> then gateway (10.6.0.100:7000) is crashing after failed try to connect 10.6.0.102:7001  .
> Others daemons works fine.
> any idea ?
> 
> Jul 04 09:03:38 [gateway 160310] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160313] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160310] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160312] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160313] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160316] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160312] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160315] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160314] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160317] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160316] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160315] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160318] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160314] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160317] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160321] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160319] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160318] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160323] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160320] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160321] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160324] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160319] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160323] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160320] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160322] do_read(268) failed to read from socket: 0
> Jul 04 09:03:38 [gateway 160326] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160324] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160327] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160322] wait_forward_write(179) remote node might have gone away
> Jul 04 09:03:38 [gateway 160328] do_write(301) failed to write to socket: Broken pipe
> Jul 04 09:03:38 [gateway 160326] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160327] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160328] send_req(337) failed to send request 3, 4096: Broken pipe
> Jul 04 09:03:38 [gateway 160329] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160330] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160333] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160334] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160332] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160331] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160335] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160337] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160338] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [gateway 160336] connect_to(234) failed to connect to 10.6.0.102:7001: Connection refused
> Jul 04 09:03:38 [main] crash_handler(408) sheep pid 2148 exited unexpectedly.
> 

Hi Alexandre,

   Can you find a file named 'core' in your /store directory? If so,
please run 'gdb sheep /store/core' and type 'where' command, then please
paste the output onto the list.

   Also, would you enable '-d' option for sheep for more debug output
and attach it to the mailing list?

Thanks,
Yuan



More information about the sheepdog-users mailing list