At Wed, 16 Jun 2010 18:57:38 +0200, Wido den Hollander wrote: > > Hi, > > A few months ago when i was testing sheepdog there were some issues with > nodes joining and leaving the cluster. > > For example, if i turned my whole cluster off and turned it back on > again, the cluster wouldn't come online and i would have to do a fresh > mkfs. > > Has this been addressed already? > Current version should be much better than what you tested before. I think nodes joining and leaving would work well now, though testing is not enough yet. Rebooting sheepdog cluster without a shutdown command is not supported yet. I think we should consider the following situations: 1) Administrator wrongly shutdowns all the nodes before running a shutdown command In this case, all nodes do not down at the same time, so internal membership info in sheepdog daemons are wrongly updated. It is not easy to fix them automatically because Sheepdog doesn't have a static membership information. I think of providing a command to fix membership information manually. 2) Power failure occurs In this case, membership info of sheepdog daemons couldn't be inconsistent because all nodes down at the same time. However, if VMs were on writing data when power failure occurred, the data objects may become inconsistent. We need to fix them, and I think we can do it automatically. Thanks, Kazutaka |