[sheepdog-users] md remove has to trigget recover in some cases

Valerio Pachera sirio81 at gmail.com
Fri May 10 11:24:35 CEST 2013


Shortly, when we unplug a disk, if data can't be redistributed/rebuilt
on the other node disks, we have to trigger a cluster recover.

Here is the reason:

3 node with a single disk of 50G.
  collie cluster format -c 2

Created 2 pre-allocated disk just to use space
  colle vdi create -P garbage 10G
  colle vdi create -P garbagebid 10G

Added a small disk
root at test006:~# collie node md plug /mnt/sdb1/

root at test006:~# collie node info
Id      Size    Used    Use%
 0      50 GB   12 GB    24%
 1      50 GB   12 GB    24%
 2      55 GB   16 GB    28%
Total   154 GB  40 GB    25%
Total virtual image size        20 GB

root at test004:~# collie node md info --all
Id      Used    Free    Path
Node 0:
 0      12 GB   38 GB   /mnt/sheepdog/obj
Node 1:
 0      12 GB   38 GB   /mnt/sheepdog/obj
Node 2:
 0      15 GB   35 GB   /mnt/sheepdog/obj
 1      1.1 GB  3.5 GB  /mnt/sdb1

The funny part :-) remove the big disk

root at test006:~# collie node md unplug /mnt/sheepdog/obj

I've seen 'node recovery' working for some time on 192.168.2.46 (id2):
sheepdog has been writing sata on the small disk till filling it full.
After it, looks like nothing has changed (as we told, unplug of a
disk, do not trigger cluster recover).

root at test004:~# collie node md info --all
Id      Used    Free    Path
Node 0:
 0      12 GB   38 GB   /mnt/sheepdog/obj
Node 1:
 0      12 GB   38 GB   /mnt/sheepdog/obj
Node 2:
 0      4.6 GB  2.8 MB  /mnt/sdb1

!!!Vdi disks are not usabe now!!!

root at test004:~# collie vdi check garbage
  FATAL: failed to write, Server has no space for new objects

I have no guests running.
I have no idea of what could happen (data corruption, I/O errors...).

It's not possible to plug the disk back (I cleared data first)
collie node md plug /mnt/sdb1/
  Failed to execute request, look for sheep.log for more information

I force cluster recover killing the node.

root at test006:~# collie node kill 2

root at test005:~# collie node info
Id      Size    Used    Use%
 0      50 GB   20 GB    40%
 1      50 GB   20 GB    40%
Total   100 GB  40 GB    40%
Total virtual image size        20 GB

root at test005:~# collie node md info --all
Id      Used    Free    Path
Node 0:
 0      20 GB   30 GB   /mnt/sheepdog/obj
Node 1:
 0      20 GB   30 GB   /mnt/sheepdog/obj

vdi check ends successfully on both vdis.



More information about the sheepdog-users mailing list