[sheepdog-users] Sheepdog 0.9 missing live migration feature

Tue May 12 13:09:49 CEST 2015

Hi,

Just tried 0.9.2_rc0 and working as expected !
Live migration between nodes is working again and live migration between storage is working too !
So far, I didn't encounter any problem with this release.

>I'd suggest MD + Object Cache + Dual NICs. Since you make use of Object cache,
>no need to open '-n'. Basically, you might take following as an example:

> #-w
> #256G is just an placeholder, you can adjust on your own. If you found
> #performance is not good enough, you can try turn off 'directio', then object
> #cache code will take advantage of memory as the cache tier. But this might
> #require you to tune some kernel memory flush settings for smooth performance.

> #/meta should be put on a raid since it is single point of failure. MD will take
> #care of your disk1,disk2,disk3. The "--directio" in the rear means don't use
> #memory for backend store. '-n' would be helpful if you find overall performance
> #sometimes drops down. '-n' in this case, will affect the performance of object
> #cache when it is doing flush-back of the dirty data.

> #-c
> #for cluster driver, I'd suggest zookeeper

> sheep -w size=256G,dir=/path/to/ssd,directio -i 'nic ip for IO' -y 'your main nic ip' \
>       /meta,/disk1/disk2,/disk3 -c xxx --directio

Ok, I'll try with these settings, I just have a few questions :
you say that /meta should be located on a RAID device because it is a SPOF ... does that mean that if /meta crashes for some reason, the whole node is crashed ?
If so, can I rely on Sheepdog's redundancy ? I mean, if I loose one node in my 9 node cluster, that shouldn't be a problem right ? So, I don't really see any problem at leaving this in a SPOF (my mean is that I was thinking in leaving this on the dedicated SSD which is not in a RAID configuration).

If I understand well, directio performances depends on the underlying physical storage so on a decent SSD, this could give good results right ?

I don't really understand the "-n" thing ... can you explain me what it does and in which case it is recommended to enable it ?

Last, I'm still asking myself which settings would be the best for cluster formating using Erasure Code. I have 9 nodes and I'd like to find the good balance between performances, capacity and security.
As I understand, in the x:y tuple, I can't have less than x nodes alive for the cluster to remain functionnal and I can lose y nodes at the same time with no data loss and no downtime ... right ?
Documentation is not very clear concerning x ... there (https://github.com/sheepdog/sheepdog/wiki/Erasure-Code-Support) it is written that x must be in the 2,4,8,16 range (so, power of 2) but there (http://www.sheepdog-project.org/doc/redundancy_level.html), it is written that it can be a multiply of 2 (2,4,6,8,10,12,...) ... Which one is right ?

For my concern, I was hesitating between these settings :
- 4:2 because I can run with very few nodes but I can lose only 2 nodes and the ratio is 0.5 which seems a good balance (only 1.5x the vdi size as real storage usage and recovery divided by 2)
- 6:3 which would give me the same ratio but would give me the ability to lose 3 nodes (but I'll need to have 2/3 of my cluster still alive so in that case, it just fits my current configuration)

What's your opinion ?

Best regards and again, many thanks for your help and kindness.

Walid

----- Mail original -----
De: "Liu Yuan" <namei.unix at gmail.com>
À: "Walid Moghrabi" <walid.moghrabi at lezard-visuel.com>
Cc: "Hitoshi Mitake" <mitake.hitoshi at gmail.com>, "sheepdog-users" <sheepdog-users at lists.wpkg.org>
Envoyé: Mardi 12 Mai 2015 04:25:40
Objet: Re: [sheepdog-users] Sheepdog 0.9 missing live migration feature

On Mon, May 11, 2015 at 01:58:07PM +0200, Walid Moghrabi wrote:
> Hi,
> 
> > Sorry for keeping you waiting. I'll backport the patch tonight.
> 
> You're great :D
> 
> > Thanks a lot for your help. But I need to say that journaling and
> > object cache are unstable features. Please don't use them in
> > production.
> 
> Too bad :(
> I was really happy to try this on my setup, I equiped every node with a separated SSD drive on which I was wanting to store Sheepdog journal and/or object cache.
> Why are thse features "unstable" ?
> What are the risks ? In which conditions shouldn't I use them ?
> 
> Unless there is heavy risk, I think I'll still make a try (at least in my crash tests before moving the cluster to production) because it looks promising and anyway, Sheepdog is not considered stable until now and I'm using it with real joy since 0.6 even on production platform so ... ;)
> 
> Anyway, just for my wn curiosity, here is what I'm planning to do for my setup, I'd really appreciate any comment on it :
> 
> 9 nodes with each :
>   - 2 interfaces, one for cluster communication ("main" network) and one dedicated to Sheepdog's replication ("storage" network) with fixed IPs, completely closed and Jumbo frames enabled (mtu 9000)
>   - 3 600Gb SAS 15k dedicated hard drives that are not part of any RAID (standalone drives) that I was thinking using in MD mode
>   - 1 SSD SATA drive (on which the OS resides and that I was thinking to use for Sheepdog's journl and object cache)
> 
> So that means 27 hard drives cluster that I wanted to format using Erasure Code but until now, I don't really now which settings I'll configure for this ... I'd like to find the good balance between performances, security and storage space ... any proposition mostly welcomed.

I'd suggest MD + Object Cache + Dual NICs. Since you make use of Object cache,
no need to open '-n'. Basically, you might take following as an example:

#-w
#256G is just an placeholder, you can adjust on your own. If you found
#performance is not good enough, you can try turn off 'directio', then object
#cache code will take advantage of memory as the cache tier. But this might
#require you to tune some kernel memory flush settings for smooth performance.

#/meta should be put on a raid since it is single point of failure. MD will take
#care of your disk1,disk2,disk3. The "--directio" in the rear means don't use
#memory for backend store. '-n' would be helpful if you find overall performance
#sometimes drops down. '-n' in this case, will affect the performance of object
#cache when it is doing flush-back of the dirty data.

#-c
#for cluster driver, I'd suggest zookeeper

sheep -w size=256G,dir=/path/to/ssd,directio -i 'nic ip for IO' -y 'your main nic ip' \
      /meta,/disk1/disk2,/disk3 -c xxx --directio

Thanks,
Yuan