[sheepdog-users] cluster-full due to different size devices

Wed Jun 19 09:07:16 CEST 2013

2013/6/19 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>:
> Valerio, can you make sure that /mnt/backup is dedicated to the sheep
> daemon?

Sorry, not all the directories listed before are used by sheepdog
(/mnt/dati, /mnt/backup, /mnt/cubonas are not)

 parallel-ssh  -i -h etc/pssh.conf 'df -h | grep mnt'
[1] 23:50:37 [SUCCESS] sheepdog004
/dev/sdc1                 466G  232G    234G  50% /mnt/wd_WCAYUEP99298
/dev/sdd1                 1,9T  834G    1,1T  45% /mnt/wd_WCAWZ1588874
[2] 23:50:37 [SUCCESS] sheepdog002
/dev/sdb1                   466G  403G     64G  87% /mnt/wd_WMAYP1690412
/dev/sdc1                   1,9T  762G    1,1T  41%
/mnt/ST2000DM001-1CH164_W1E2N5GM
/mnt/ST2000DM001-1CH164_W1E2N5GM
[3] 23:50:37 [SUCCESS] sheepdog003
/dev/sda3       217G  146G     71G  68% /mnt/sheep/dsk01
/dev/sdb1       2,8T  1,1T    1,7T  40% /mnt/sheep/dsk02
[4] 23:50:37 [SUCCESS] sheepdog001
/dev/sdc1            1,9T  1,1T    768G  59% /mnt/ST2000DM001-1CH164_W1E2N5G6

This was the only disk getting full: /mnt/wd_WMAYP1690412.

Note that sheepdog001 is using only a single disk because I had to
unplug its 500G disk before, for the same reason.

Yesterday night I went sleeping after unplugging
(/mnt/wd_WMAYP1690412) and running a guest on sheepdog004.

This moring I got two main problems (one saw before, one not):
- sheepdog004 daemon died again flushing the cache (that I deleted
first by collie vdi cache delete).
    Jun 18 23:40:08 [oc_push 12761] push_cache_object(471) failed to
push object No object found
    Jun 18 23:40:08 [oc_push 12761] do_push_object(841) PANIC: push
failed but should never fail
  It seems to be a node related problem.
- root at sheepdog002:~# collie node md info
  Id      Size    Used    Avail   Use%    Path
  It looks like the unplug of the disk didn't work well, the list is empty.
  And the disk got full anyway!
  /dev/sdb1   466G  466G    4,0K 100% /mnt/wd_WMAYP1690412

Note:
not to confuse you, but to give you all the info i have:
Prequel: yesterday (~16:30) we had problem with the switch. The
cluster went off in a non sweet way.
Once up again, sheepdog002 was not showing all vdi.
I killed the node and insert it backup, so it was showing all vdi.
I waited till evening (21:00) and run also vdi checks.
After 23:00 I noticed the disk filling up.
The rest is written above.