[sheepdog-users] What can I do with my .stale objects ?

Tue Jan 13 15:49:25 CET 2015

Here are more details about my problem : 

I have a 2 node cluster using Sheepdog 0.8.2 and Corosync 1.4.7 on Debian Wheezy. 
IO and cluster communication are using the same interface on both nodes (I know, it's bad) and Corosync is setup using Unicast UDP since I can't use Multicast on that network. 

Sheepdog is used for QEMU/KVM VDI storage (I'm using ProxMox). 
I created a VM and rsync'ed lots of data from one of the local node filesystem to the VM's filesystem without giving any restriction on rsync bandwidth usage. 
So, rsync overloaded the network and thus, Corosync lost communication at time and messed up Sheepdog inducing lots of io errors and high cpu usage. 
My cluster is formated as simple replication with 2 copies. 

Most VDIs were unaffected but some where trashed. 

Here are some informations : 

===================================== 

px1 ~ > dog node info 
Id Size Used Avail Use% 
0 3.0 TB 2.8 TB 190 GB 93% 
1 2.9 TB 2.8 TB 163 GB 94% 
Total 5.9 TB 5.6 TB 353 GB 94% 

Total virtual image size 1.6 TB 

===================================== 

=============================================================================== 

px1 ~ > dog vdi list 
Name Id Size Used Shared Creation time VDI id Copies Tag 
vm-1040-disk-1 0 20 GB 1.3 GB 0.0 MB 2014-10-14 10:35 1c693b 2 
vm-1220-disk-1 0 20 GB 3.5 GB 0.0 MB 2015-01-05 15:40 1f6c43 2 
vm-9001-disk-1 0 20 GB 1.3 GB 0.0 MB 2014-12-16 23:48 251fa6 2 
vm-1042-disk-1 0 20 GB 2.0 GB 0.0 MB 2014-10-27 16:51 449830 2 
vm-1210-disk-1 0 20 GB 1.7 GB 0.0 MB 2014-12-16 23:33 47227b 2 
vm-1043-disk-1 0 20 GB 1.5 GB 0.0 MB 2015-01-12 01:16 54a476 2 
vm-1224-disk-1 0 20 GB 2.2 GB 0.0 MB 2015-01-12 23:14 5ba07f 2 
vm-1200-disk-1 0 20 GB 6.1 GB 0.0 MB 2014-12-31 11:49 6117ce 2 
vm-1202-disk-1 0 20 GB 3.8 GB 0.0 MB 2015-01-05 15:12 629477 2 
vm-1221-disk-1 0 20 GB 2.2 GB 0.0 MB 2015-01-07 21:20 7ba7b8 2 
vm-1221-disk-2 0 50 GB 1.1 GB 0.0 MB 2015-01-12 22:28 7ba96c 2 
vm-1221-disk-3 0 50 GB 1.7 GB 0.0 MB 2015-01-12 22:28 7bab1f 2 
vm-1203-disk-1 0 20 GB 3.5 GB 0.0 MB 2015-01-05 15:34 979e2c 2 
vm-1234-disk-2 0 300 GB 195 GB 0.0 MB 2014-10-27 17:29 9e1194 2 
vm-1234-disk-1 0 20 GB 2.9 GB 0.0 MB 2014-10-10 23:44 9e16ae 2 
vm-1222-disk-1 0 20 GB 2.1 GB 0.0 MB 2015-01-07 21:25 a387d9 2 
vm-1222-disk-3 0 50 GB 1.7 GB 0.0 MB 2015-01-12 22:30 a38b40 2 
vm-1222-disk-2 0 50 GB 1.1 GB 0.0 MB 2015-01-12 22:29 a38cf3 2 
vm-1044-disk-1 0 20 GB 1.5 GB 0.0 MB 2014-12-30 16:48 ac07f6 2 
vm-1211-disk-2 0 800 GB 10 GB 684 GB 2015-01-12 10:06 c2ec58 2 
vm-1211-disk-1 0 20 GB 172 MB 1.7 GB 2015-01-12 10:06 c2ee0f 2 
vm-1041-disk-1 0 20 GB 2.0 GB 0.0 MB 2014-10-27 12:11 de2b0f 2 
vm-1045-disk-1 0 20 GB 1.5 GB 0.0 MB 2014-12-30 17:06 e111ab 2 
vm-1201-disk-1 0 20 GB 3.9 GB 0.0 MB 2015-01-04 12:04 f8b1d2 2 
vm-1204-disk-1 0 20 GB 3.6 GB 0.0 MB 2015-01-05 15:37 fb24b1 2 
=============================================================================== 

===================================== 

px1 /var/lib/sheepdog > du -h 
0 ./obj/.stale 
2.8T ./obj 
44K ./epoch 
2.8T . 

px2 /var/lib/sheepdog > du -h 
1.7T ./obj/.stale 
2.9T ./obj 
44K ./epoch 
2.9T 

===================================== 

I deleted the trashed VDIs, checked/repaired the remaining ones (and even recreated some) but I end up with some strange situation : 

- as you can see, I have quite some VDIs, most of them are small (20GB) and mostly unused (most of them are effectively using less than 10% of allocated space) at the exception of 2 or 3 of them. My total size for allocated VDIs accounts for 1.6TB, which, if I understand well, means 3.2TB occupied on the cluster since every VDIs is formated with 2 copies. 
Concerning real usage, I'm around 1TB adding "used" and "shared" (anyway, what is the difference between used and shared ??) so, in fact, around 2TB. 

Sheepdog announces that my cluster space is about 5.9TB so normally, I have plenty of space and I should have nearly 4TB of free space (so in fact, 2TB really usable space). 
But, as you can see, node info is telling me that my cluster is nearly full (94%) where it should be around 30% used ... why ?? 
on the "du" command in /var/lib/sheepdog, you can see that node 1 ("px1") has 2.8TB in the "obj" folder and nothing in ".stale" while "px2" has nearly the same amount of data in "obj" (but not exactly the same value and here again ... why ? since we're using simple replication, shouldn't I end with the same value ??) and, above all, 1.7TB in ".stale". 
I guess ".stale" objects are hard links because 1.7 + 2.9 doesn't do 2.9 to me ... 

Well ... 
That's pretty strange however and I don't know how to fix this situation. 
Apparently, every remaining VDIs are working properly and I had no troubles with them but, since Sheepdog announces that I have nearly no space available, I can't create more VDIs. 

I took measures to not overload the network again (reduced mtu size to 1400 in order to avoid packet fragmentation as I read somewhere and limited rsync traffic bandwidth usage in order to leave some space for Corosync communication) .... so far, seems to be stable but now, what can I do ? 
I was thinking in backuping every VDIs then deletre/recreate my cluster in ordre to clean up the whole stuff but this is pretty long and I had no time to do it for the moment so if you have a better idea, I would be gratefull. 

Best regards, 

Walid Moghrabi 

----- Mail original -----

De: "Fabian Zimmermann" <dev.faz at gmail.com> 
À: "Hitoshi Mitake" <mitake.hitoshi at lab.ntt.co.jp>, "Walid Moghrabi" <walid.moghrabi at lezard-visuel.com> 
Cc: sheepdog-users at lists.wpkg.org 
Envoyé: Mardi 13 Janvier 2015 08:21:18 
Objet: Re: [sheepdog-users] What can I do with my .stale objects ? 

Hi 

Am 13.01.15 um 06:58 schrieb Hitoshi Mitake: 
> Remaining stale objects would be a bug of sheepdog, could you provide log of the node? 
same problem here (remaining objects in .stale) - will try to reproduce 
the problem. 

Fabian 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150113/cd7f33a5/attachment-0005.html>