[sheepdog-users] gateway crashing if 1 node fail
    Alexandre DERUMIER 
    aderumier at odiso.com
       
    Wed Jul  4 10:53:53 CEST 2012
    
    
  
I have reproduce the gateway crash bug:
collie cluster format --copies=2
# collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   10.6.0.100:7000         0 1677723146
-    1   10.6.0.100:7001        64 1677723146
-    2   10.6.0.101:7000         0 1694500362
-    3   10.6.0.101:7001        64 1694500362
-    4   10.6.0.102:7000         0 1711277578
-    5   10.6.0.102:7001        64 1711277578
on 10.6.0.101, killing the 2 sheep daemon (:7000 gateway, :7001 storage)
killall -9 sheep
then the 2 daemons on 10.6.0.100 have crashed
# collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   10.6.0.102:7000         0 1711277578
-    1   10
.6.0.102:7001        64 1711277578
10.6.0.100:7000 logs
Jul 04 10:42:42 [gateway 1044443] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044444] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044447] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044446] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044448] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044449] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044450] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044451] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044452] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044453] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044455] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044454] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044456] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044457] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044457] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044443] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044452] wait_forward_write(187) fail 41
Jul 04 10:42:43 [main] crash_handler(408) sheep pid 19842 exited unexpectedly.
10.6.0.100:7001 , no logs...
I don't have find any core file in /var/lib/sheepdog or /var/lib/sheepdoggateway (for the gateway daemon)
----- Mail original ----- 
De: "Liu Yuan" <namei.unix at gmail.com> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: sheepdog-users at lists.wpkg.org 
Envoyé: Mercredi 4 Juillet 2012 10:29:12 
Objet: Re: [sheepdog-users] gateway crashing if 1 node fail 
On 07/04/2012 04:21 PM, Alexandre DERUMIER wrote: 
>>> copies=3 means you at least should have >= 3 sheep daemon available or 
>>> >>the cluster will go to halted state (not serving any IO requests at all) 
> Ok,thanks didn't know that. It's more clear now. 
> 
> 
If you want the cluster to proceed even without enough daemons, you can 
pass -H or --nohalt to 'collie cluster format' like 
$ collie cluster format -H -c 3 
Thanks, 
Yuan 
-- 
-- 
	
Alexandre D e rumier 
Ingénieur Systèmes et Réseaux 
Fixe : 03 20 68 88 85 
Fax : 03 20 68 90 88 
45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
    
    
More information about the sheepdog-users
mailing list