[sheepdog-users] gateway crashing if 1 node fail

Alexandre DERUMIER aderumier at odiso.com
Wed Jul 4 10:53:53 CEST 2012


I have reproduce the gateway crash bug:
collie cluster format --copies=2

# collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   10.6.0.100:7000         0 1677723146
-    1   10.6.0.100:7001        64 1677723146
-    2   10.6.0.101:7000         0 1694500362
-    3   10.6.0.101:7001        64 1694500362
-    4   10.6.0.102:7000         0 1711277578
-    5   10.6.0.102:7001        64 1711277578


on 10.6.0.101, killing the 2 sheep daemon (:7000 gateway, :7001 storage)
killall -9 sheep

then the 2 daemons on 10.6.0.100 have crashed

# collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   10.6.0.102:7000         0 1711277578
-    1   10
.6.0.102:7001        64 1711277578

10.6.0.100:7000 logs

Jul 04 10:42:42 [gateway 1044443] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044444] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044447] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044446] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044448] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044449] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044450] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044451] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044452] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044453] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044455] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044454] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044456] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044457] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044457] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044443] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044452] wait_forward_write(187) fail 41
Jul 04 10:42:43 [main] crash_handler(408) sheep pid 19842 exited unexpectedly.

10.6.0.100:7001 , no logs...


I don't have find any core file in /var/lib/sheepdog or /var/lib/sheepdoggateway (for the gateway daemon)


----- Mail original ----- 

De: "Liu Yuan" <namei.unix at gmail.com> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: sheepdog-users at lists.wpkg.org 
Envoyé: Mercredi 4 Juillet 2012 10:29:12 
Objet: Re: [sheepdog-users] gateway crashing if 1 node fail 

On 07/04/2012 04:21 PM, Alexandre DERUMIER wrote: 
>>> copies=3 means you at least should have >= 3 sheep daemon available or 
>>> >>the cluster will go to halted state (not serving any IO requests at all) 
> Ok,thanks didn't know that. It's more clear now. 
> 
> 

If you want the cluster to proceed even without enough daemons, you can 
pass -H or --nohalt to 'collie cluster format' like 

$ collie cluster format -H -c 3 

Thanks, 
Yuan 



-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 



More information about the sheepdog-users mailing list