I have reproduce the gateway crash bug: collie cluster format --copies=2 # collie node list M Id Host:Port V-Nodes Zone - 0 10.6.0.100:7000 0 1677723146 - 1 10.6.0.100:7001 64 1677723146 - 2 10.6.0.101:7000 0 1694500362 - 3 10.6.0.101:7001 64 1694500362 - 4 10.6.0.102:7000 0 1711277578 - 5 10.6.0.102:7001 64 1711277578 on 10.6.0.101, killing the 2 sheep daemon (:7000 gateway, :7001 storage) killall -9 sheep then the 2 daemons on 10.6.0.100 have crashed # collie node list M Id Host:Port V-Nodes Zone - 0 10.6.0.102:7000 0 1711277578 - 1 10 .6.0.102:7001 64 1711277578 10.6.0.100:7000 logs Jul 04 10:42:42 [gateway 1044443] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044444] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044447] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044445] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044446] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044448] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044449] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044450] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044451] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044452] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044453] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044455] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044454] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044456] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044457] connect_to(234) failed to connect to 10.6.0.101:7001: Connection refused Jul 04 10:42:42 [gateway 1044445] wait_forward_write(187) fail 41 Jul 04 10:42:42 [gateway 1044457] wait_forward_write(187) fail 41 Jul 04 10:42:42 [gateway 1044443] wait_forward_write(187) fail 41 Jul 04 10:42:42 [gateway 1044452] wait_forward_write(187) fail 41 Jul 04 10:42:43 [main] crash_handler(408) sheep pid 19842 exited unexpectedly. 10.6.0.100:7001 , no logs... I don't have find any core file in /var/lib/sheepdog or /var/lib/sheepdoggateway (for the gateway daemon) ----- Mail original ----- De: "Liu Yuan" <namei.unix at gmail.com> À: "Alexandre DERUMIER" <aderumier at odiso.com> Cc: sheepdog-users at lists.wpkg.org Envoyé: Mercredi 4 Juillet 2012 10:29:12 Objet: Re: [sheepdog-users] gateway crashing if 1 node fail On 07/04/2012 04:21 PM, Alexandre DERUMIER wrote: >>> copies=3 means you at least should have >= 3 sheep daemon available or >>> >>the cluster will go to halted state (not serving any IO requests at all) > Ok,thanks didn't know that. It's more clear now. > > If you want the cluster to proceed even without enough daemons, you can pass -H or --nohalt to 'collie cluster format' like $ collie cluster format -H -c 3 Thanks, Yuan -- -- Alexandre D e rumier Ingénieur Systèmes et Réseaux Fixe : 03 20 68 88 85 Fax : 03 20 68 90 88 45 Bvd du Général Leclerc 59100 Roubaix 12 rue Marivaux 75002 Paris |