[sheepdog] vdi backup/restore questions

Wed Sep 12 17:14:29 CEST 2012

I'm very glad to see this feature.  I have been testing it on and off
since Sunday when it was mentioned.  I wonder if it is possible to
allow the usage of stdin/stdout in addition to files for backup and
restore, similar to "vdi read"/"vdi write"?  This would help with
streaming the "snaps" via ssh to another machine or even another
cluster or even pipe it through compression before writing to disk.

Also I am still going to eventually run out of space doing this as the
80(index?) files are still left over after deleting a snapshot or vdi.
 I brought this up a few months ago and couldn't tell if I missed
something, but I am running off the dev tree code from late Sunday.
See below for my steps.

----- START CLUSTER CONFIG -----
[root at sd0 ~]# sheep -c corosync -y 192.168.56.101 -z 0 /sheep
[root at sd1 ~]# sheep -c corosync -y 192.168.56.102 -z 1 /sheep
[root at sd2 ~]# sheep -c corosync -y 192.168.56.103 -z 2 /sheep
[root at sd0 ~]# collie cluster format -b farm -c 3 -m unsafe
----- STOP CLUSTER CONFIG -----

----- START CLUSTER INFO -----
[root at sd0 ~]# collie node info
Id	Size	Used	Use%
 0	16 GB	0.0 MB	  0%
 1	16 GB	0.0 MB	  0%
 2	16 GB	0.0 MB	  0%
Total	48 GB	0.0 MB	  0%

[root at sd0 ~]# collie cluster info
Cluster status: running
Cluster created at Wed Sep 12 10:34:40 2012
Epoch Time           Version
2012-09-12 10:34:41      1 [192.168.56.101:7000, 192.168.56.102:7000,
192.168.56.103:7000]
----- STOP CLUSTER INFO -----

Now if I create vdi "test" I have no data, but the "80" file or 4MB.
[root at sd0 ~]# collie vdi create test 128M
[root at sd0 ~]# collie node info
Id	Size	Used	Use%
 0	16 GB	4.0 MB	  0%
 1	16 GB	4.0 MB	  0%
 2	16 GB	4.0 MB	  0%
Total	48 GB	12 MB	  0%
Total virtual image size	128 MB
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4198968 Sep 12 10:35 807c2b2500000000

Now I write a little bit of data to the vdi and get the "80" and one
"data" file.
[root at sd0 ~]# echo "initial" | collie vdi write test 0 512
[root at sd0 ~]# collie vdi read test 0 512
initial
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4194304 Sep 12 10:38 /sheep/obj/007c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:38 /sheep/obj/807c2b2500000000

Now I take a snapshot "initial" and it created another "80" file with
no "data" files because no data has been written since the snapshot.
[root at sd0 ~]# collie vdi snapshot -s initial test
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4194304 Sep 12 10:38 /sheep/obj/007c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:39 /sheep/obj/807c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:39 /sheep/obj/807c2b2600000000
[root at sd0 ~]# collie vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
s test         1  128 MB  4.0 MB  0.0 MB 2012-09-12 10:35   7c2b25
3       initial
  test         2  128 MB  0.0 MB  4.0 MB 2012-09-12 10:39   7c2b26     3

Now I write new data into the vdi and create an additional snapshot
"ONE".  So now I have three "80" files and two "data" files one per
id.
[root at sd0 ~]# echo "ONE" | collie vdi write test 0 512
[root at sd0 ~]# collie vdi snapshot -s ONE test
[root at sd0 ~]# collie vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
s test         1  128 MB  4.0 MB  0.0 MB 2012-09-12 10:35   7c2b25
3       initial
s test         2  128 MB  4.0 MB  0.0 MB 2012-09-12 10:39   7c2b26
3           ONE
  test         3  128 MB  0.0 MB  4.0 MB 2012-09-12 10:40   7c2b27
3
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4194304 Sep 12 10:38 /sheep/obj/007c2b2500000000
-rw-r----- 1 root root 4194304 Sep 12 10:40 /sheep/obj/007c2b2600000000
-rw-r----- 1 root root 4198968 Sep 12 10:39 /sheep/obj/807c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:40 /sheep/obj/807c2b2600000000
-rw-r----- 1 root root 4198968 Sep 12 10:40 /sheep/obj/807c2b2700000000

So now I want to backup the vdi.  First I create a "full" backup.
[root at sd0 ~]# collie vdi read -s initial test | gzip > /tmp/test.initial.gz

Then I create a "diff" backup:
[root at sd0 ~]# collie vdi backup -s ONE -F initial test /tmp/test.initial-ONE

Now I delete the snaps:
[root at sd0 ~]# collie vdi delete -s ONE test
[root at sd0 ~]# collie vdi delete -s initial test
[root at sd0 ~]# collie vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  test         3  128 MB  0.0 MB  4.0 MB 2012-09-12 10:40   7c2b27     3

But I am still using up disk space after they are deleted:
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4194304 Sep 12 10:38 /sheep/obj/007c2b2500000000
-rw-r----- 1 root root 4194304 Sep 12 10:40 /sheep/obj/007c2b2600000000
-rw-r----- 1 root root 4198968 Sep 12 10:53 /sheep/obj/807c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:53 /sheep/obj/807c2b2600000000
-rw-r----- 1 root root 4198968 Sep 12 10:40 /sheep/obj/807c2b2700000000

Even if I delete the actual vdi as well:
[root at sd0 ~]# collie vdi delete test
[root at sd0 ~]# collie vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
[root at sd0 ~]# ls -l /sheep/obj/*
-rw-r----- 1 root root 4198968 Sep 12 10:53 /sheep/obj/807c2b2500000000
-rw-r----- 1 root root 4198968 Sep 12 10:53 /sheep/obj/807c2b2600000000
-rw-r----- 1 root root 4198968 Sep 12 10:54 /sheep/obj/807c2b2700000000

I don't seem to be able to find a way to re-claim this space.  If I
just delete the files on each node I get an error.
[root at sd0 ~]# find /sheep/obj -type f -name 807c2b27* | xargs rm -f
[root at sd1 ~]# find /sheep/obj -type f -name 807c2b27* | xargs rm -f
[root at sd2 ~]# find /sheep/obj -type f -name 807c2b27* | xargs rm -f
[root at sd0 ~]# collie vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
Failed to read object 807c2b2700000000 No object found
Failed to read inode header

Is there any way to re-claim this space?  For us we currently snapshot
each VDI (our PROD data is not in sheepdog yet) every 15 minutes 24
hours a day for DR and ease of recovery.  We end up aging these out
and keeping just Hourly, and then at some point Daily, etc etc.  We
have almost 100 VM's (some have more than one disk/VDI), but based on
this I would chew up ~384MB/day/VDI * 3 (cluster copies).

Regards and thanks for everyone's work on this project.