[sheepdog] effective storing backups and deduplication

Liu Yuan namei.unix at gmail.com
Wed Feb 11 13:43:44 CET 2015

On Wed, Feb 11, 2015 at 04:32:34PM +0400, Vasiliy Tolstov wrote:
> 2015-02-11 15:28 GMT+03:00 Liu Yuan <namei.unix at gmail.com>:
> > We need to what is user's backups. Is it the whole vdi or dalta data for
> > different vdis?
> Best scheme as i think is:
> 1) If backup not exists for vdi - create full backup (this is simple
> copy all data)
> 2) If backup already created - create new backup and copy only delta
> from previous backup.
> 3) If use delete old backup - remove garbage pieces that not belongs
> to other vdi.
> 4) In case of steps from 1 to 2 - check other vdi pieces for duplicate
> data and store only difference. But i think this is very problematic
> in this case.

This scheme can build on the sheepdog's current features:

0 use qemu-img (recommenced because better performance) or dog to read the base

1 use dog to backup the delta data for different snapshots takben by
  qemu-img snapshot or dog vdi snapshot.
2 manage the delta data and the base for the user defined snapshots relations
  by the upper layer

3 use SD http storage to store the base and delta data.

I guess you need something as the middle layer to map the user defined snapshots
to sheepdog's base and delta data and implement gc in this middle layer.
Authentication would be better implementated in this middleware.


More information about the sheepdog mailing list