[sheepdog-users] gateway crashing if 1 node fail
Alexandre DERUMIER
aderumier at odiso.com
Wed Jul 4 10:53:53 CEST 2012
I have reproduce the gateway crash bug:
collie cluster format --copies=2
# collie node list
M Id Host:Port V-Nodes Zone
- 0 10.6.0.100:7000 0 1677723146
- 1 10.6.0.100:7001 64 1677723146
- 2 10.6.0.101:7000 0 1694500362
- 3 10.6.0.101:7001 64 1694500362
- 4 10.6.0.102:7000 0 1711277578
- 5 10.6.0.102:7001 64 1711277578
on 10.6.0.101, killing the 2 sheep daemon (:7000 gateway, :7001 storage)
killall -9 sheep
then the 2 daemons on 10.6.0.100 have crashed
# collie node list
M Id Host:Port V-Nodes Zone
- 0 10.6.0.102:7000 0 1711277578
- 1 10
.6.0.102:7001 64 1711277578
10.6.0.100:7000 logs
Jul 04 10:42:42 [gateway 1044443] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044444] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044447] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044446] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044448] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044449] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044450] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044451] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044452] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044453] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044455] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044454] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044456] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044457] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused
Jul 04 10:42:42 [gateway 1044445] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044457] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044443] wait_forward_write(187) fail 41
Jul 04 10:42:42 [gateway 1044452] wait_forward_write(187) fail 41
Jul 04 10:42:43 [main] crash_handler(408) sheep pid 19842 exited unexpectedly.
10.6.0.100:7001 , no logs...
I don't have find any core file in /var/lib/sheepdog or /var/lib/sheepdoggateway (for the gateway daemon)
----- Mail original -----
De: "Liu Yuan" <namei.unix at gmail.com>
À: "Alexandre DERUMIER" <aderumier at odiso.com>
Cc: sheepdog-users at lists.wpkg.org
Envoyé: Mercredi 4 Juillet 2012 10:29:12
Objet: Re: [sheepdog-users] gateway crashing if 1 node fail
On 07/04/2012 04:21 PM, Alexandre DERUMIER wrote:
>>> copies=3 means you at least should have >= 3 sheep daemon available or
>>> >>the cluster will go to halted state (not serving any IO requests at all)
> Ok,thanks didn't know that. It's more clear now.
>
>
If you want the cluster to proceed even without enough daemons, you can
pass -H or --nohalt to 'collie cluster format' like
$ collie cluster format -H -c 3
Thanks,
Yuan
--
--
Alexandre D e rumier
Ingénieur Systèmes et Réseaux
Fixe : 03 20 68 88 85
Fax : 03 20 68 90 88
45 Bvd du Général Leclerc 59100 Roubaix
12 rue Marivaux 75002 Paris
More information about the sheepdog-users
mailing list