[sheepdog-users] Sheepdog 0.9 missing live migration feature

Wed May 13 12:22:11 CEST 2015

At Wed, 13 May 2015 17:49:41 +0800,
Liu Yuan wrote:
> 
> On Mon, May 11, 2015 at 09:13:10PM +0900, Hitoshi Mitake wrote:
> > On Mon, May 11, 2015 at 8:58 PM, Walid Moghrabi
> > <walid.moghrabi at lezard-visuel.com> wrote:
> > > Hi,
> > >
> > >> Sorry for keeping you waiting. I'll backport the patch tonight.
> > >
> > > You're great :D
> > 
> > I released v0.9.2_rc0. Please try it out:
> > https://github.com/sheepdog/sheepdog/releases/tag/v0.9.2_rc0
> > 
> > >
> > >> Thanks a lot for your help. But I need to say that journaling and
> > >> object cache are unstable features. Please don't use them in
> > >> production.
> > >
> > > Too bad :(
> > > I was really happy to try this on my setup, I equiped every node with a separated SSD drive on which I was wanting to store Sheepdog journal and/or object cache.
> > > Why are thse features "unstable" ?
> > > What are the risks ? In which conditions shouldn't I use them ?
> > 
> > As far as we know, there are risks of sheep daemon crash under heavy load.
> > 
> > >
> > > Unless there is heavy risk, I think I'll still make a try (at least in my crash tests before moving the cluster to production) because it looks promising and anyway, Sheepdog is not considered stable until now and I'm using it with real joy since 0.6 even on production platform so ... ;)
> > >
> > > Anyway, just for my wn curiosity, here is what I'm planning to do for my setup, I'd really appreciate any comment on it :
> > >
> > > 9 nodes with each :
> > >   - 2 interfaces, one for cluster communication ("main" network) and one dedicated to Sheepdog's replication ("storage" network) with fixed IPs, completely closed and Jumbo frames enabled (mtu 9000)
> > >   - 3 600Gb SAS 15k dedicated hard drives that are not part of any RAID (standalone drives) that I was thinking using in MD mode
> > >   - 1 SSD SATA drive (on which the OS resides and that I was thinking to use for Sheepdog's journl and object cache)
> > >
> > > So that means 27 hard drives cluster that I wanted to format using Erasure Code but until now, I don't really now which settings I'll configure for this ... I'd like to find the good balance between performances, security and storage space ... any proposition mostly welcomed.
> > 
> > I think your configuration doesn't have anything bad. But I suggest
> > being conservative as much as possible. For example, don't enable
> > optimizations ( -n option, example) if your current configuration can
> > provide enough performance. Our internal testing is focusing on basic
> > components. They would be enough stable. But we cannot allocate time
> > for testing optional things (testing and benchmarking distributed
> > storage is really costly), so optional things would have more bugs
> > than the basic components.
> > 
> 
> Sorry for cutting in your conversation, based on our recent tests, I'm afraid
> that basic components aren't as stable as you think. When the data grow into
> sevaral TB, our cluster crashes even by a single command 'dog vdi delete'
> sometimes. Even we restart a sheep would cause another sheep or the whole
> cluster crashes. The good side is that, after crashes day by day, the data are
> still in good state, no loss yet found. No object cache enabled in our env.
> 
> we have pure gateway + sheep(15 nodes).
> 
> Thanks,
> Yuan

Could you provide logs of the crashed sheeps?

Thanks,
Hitoshi