[sheepdog] [PATCH] sheep: add a kill node operation

Liu Yuan namei.unix at gmail.com
Fri Jul 20 09:25:51 CEST 2012


On 07/20/2012 02:55 PM, Dietmar Maurer wrote:
> Ok, let explain me by a simple example:
> 
> - 3 nodes with 1TB disk space, --copies 2
> - 50% used
> 
> Now I want to install a new kernel on one node, so I need to reboot, which takes about 3 minute.
> 
> At reboot, when sheepdog is stopped, both remaining nodes start object recovery. Each node
> needs to copy about  0.5*0.5*1TB = 250GB of data.
> 
> Such large amount of data utilizes the network for 100% until the rebooted node comes up again.
> 
> That is expected behavior?

Yes, for now. Temporary node detection mechanism is not that easy to
implement, it needs fundamental change to current recovery and IO path
code, especially how do we handle IOs routed to the temporarily failed
node is most difficult to get it right.

Thanks,
Yuan



More information about the sheepdog mailing list