> > Maybe we can delay the start of recovery for some time (1h)? That way > > a normal server reboot does not harm. > > > > > > Then how do you handle IOs routed to the down node if you don't recover > the membership state? like 'recovery in process'? It simply delays starting copy data by some time. For example you could set 'recovery_delay' to: 0 => start immediately (current behaviour) X => start copying data after X seconds MAX_INT => never start (manual) When set to a moderate value (5 minutes), you can simple reboot a server without problems. Recovery takes quite long (depends on amount of data and network speed), so adding a short delay should not harm? - Dietmar |