On 05/10/2013 05:24 PM, Valerio Pachera wrote: > Shortly, when we unplug a disk, if data can't be redistributed/rebuilt > on the other node disks, we have to trigger a cluster recover. > > Here is the reason: > > 3 node with a single disk of 50G. > collie cluster format -c 2 > > Created 2 pre-allocated disk just to use space > colle vdi create -P garbage 10G > colle vdi create -P garbagebid 10G > > Added a small disk > root at test006:~# collie node md plug /mnt/sdb1/ > > root at test006:~# collie node info > Id Size Used Use% > 0 50 GB 12 GB 24% > 1 50 GB 12 GB 24% > 2 55 GB 16 GB 28% > Total 154 GB 40 GB 25% > Total virtual image size 20 GB > > root at test004:~# collie node md info --all > Id Used Free Path > Node 0: > 0 12 GB 38 GB /mnt/sheepdog/obj > Node 1: > 0 12 GB 38 GB /mnt/sheepdog/obj > Node 2: > 0 15 GB 35 GB /mnt/sheepdog/obj > 1 1.1 GB 3.5 GB /mnt/sdb1 > > The funny part :-) remove the big disk > > root at test006:~# collie node md unplug /mnt/sheepdog/obj > > I've seen 'node recovery' working for some time on 192.168.2.46 (id2): > sheepdog has been writing sata on the small disk till filling it full. > After it, looks like nothing has changed (as we told, unplug of a > disk, do not trigger cluster recover). > > root at test004:~# collie node md info --all > Id Used Free Path > Node 0: > 0 12 GB 38 GB /mnt/sheepdog/obj > Node 1: > 0 12 GB 38 GB /mnt/sheepdog/obj > Node 2: > 0 4.6 GB 2.8 MB /mnt/sdb1 > > !!!Vdi disks are not usabe now!!! > > root at test004:~# collie vdi check garbage > FATAL: failed to write, Server has no space for new objects > > I have no guests running. > I have no idea of what could happen (data corruption, I/O errors...). > > It's not possible to plug the disk back (I cleared data first) > collie node md plug /mnt/sdb1/ > Failed to execute request, look for sheep.log for more information > Unplug disk indeed will trigger node-wide recovery. No, you just plugged the wrong disk. You should $ collie node md plug /mnt/sheepdog/obj And everything will be fine. For disk full condition, sheepdog can't treat it the way as EIO that remove the fulled disk to avoid a possible chain of full disk failures that previous disk full trigger another disk full until the whole cluster is dead. So in your case, the cluster is considered full. Yeah, one disk or one node is full, then the whole cluster is full. Thanks, Yuan |