[sheepdog-users] dog cluster snapshot problems

Thu Aug 7 17:33:56 CEST 2014

The slowness is an issue for me.  In our cluster now, we're already up to the cluster snapshot taking more than six hours to complete, whereas the backup script runs in less than 2 hours for full, or 10 minutes for the incrementals (roughly).

I've written a script that generates a snapshot at the first Sunday of the month and writes that out using qemu-img convert -O qcow2.  That snapshot stays in place.  Every day after that, the script generates a new snapshot and then dumps the incremental using dog vdi backup -F <first monday> -s current_date > gzip -9 > file.img.  I've been considering making it a sliding incremental rather than cumulative incremental, but our data doesn't change enough to make the files that large.

So to restore to an particular date is a qemu-img convert into sheepdog and dog vdi restore -s current_date < gunzip file.img.

The script also cleans up after itself, so I keep incrementals or fulls for the last 2 weeks, weeklys (full + Sunday incremental) for 2 months, and then monthly fulls after that.

I was originally enchanted by the idea of using cluster snapshot, but we've been through a few cycles with memory leaks and other issues that I've fallen back on the above to ensure my backups.  Cron job it and forget it.  I add a vdi name to the script as we add new instances.

On 08/07/2014 09:46 AM, Valerio Pachera wrote:

2014-08-07 15:32 GMT+02:00 Andrew J. Hobbs <ajhobbs at desu.edu<mailto:ajhobbs at desu.edu>>:
My suspicion is there's some issue with uniqueness, and that you're clobbering the older snapshots when you reuse a tag.

I try to change it and see what happens.

I've only used the snapshot feature a few times as I've found it to be not as useful as scripting converts to qcow2, then incrementals through the week/month.

May you explain that better?
I think cluster snapshot is very efficient in terms of used space.

I also had performance issues when saving snapshots to a remote NAS in another building.

About performance I also noticed it's very slow (the first run much more than the others).
Something like 5-6 M/s.
The upside of this is that you can run it any time without impacting cluster performance.

Restore is also very slow.
This may be a major problem if I have to restore terabytes of data.

But as of now I'm focusing on the space efficiency and the possibility to restore single disks (not tested yet becuse of the problem I noticed now).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajhobbs.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: ajhobbs.vcf
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140807/6c060b4b/attachment-0005.vcf>