[sheepdog-users] cluster-full due to different size devices

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Wed Jun 19 06:56:54 CEST 2013


At Wed, 19 Jun 2013 10:02:39 +0800,
Liu Yuan wrote:
> 
> On 06/19/2013 06:10 AM, Valerio Pachera wrote:
> > It's happening again:
> > The node sheepdog002 is filling up it's smaller device (500G 87%
> > /mnt/wd_WMAYP1690412).
> > The same is not happening to node sheepdog004 (500G) , nor sheepdog003 (217G).
> > 
> > Note: i killed sheepdog002 and insert it back to the cluster right away.
> > This triggered the cluster recovery.
> > When I noticed sheepdog002 was filling up its smaller disk, I tried to
> > call cluster reweigh, but it didn't help.
> > 
> > 
> >  parallel-ssh  -i -h etc/pssh.conf 'df -h | grep mnt'
> > [1] 23:50:37 [SUCCESS] sheepdog004
> > /dev/mapper/vg01-bkp      296G  267G     14G  96% /mnt/backup
> > /dev/sdc1                 466G  232G    234G  50% /mnt/wd_WCAYUEP99298
> > /dev/sdd1                 1,9T  834G    1,1T  45% /mnt/wd_WCAWZ1588874
> > [2] 23:50:37 [SUCCESS] sheepdog002
> > /dev/mapper/vg00-dati 213G  144G     59G  72% /mnt/dati
> > /dev/sdb1                   466G  403G     64G  87% /mnt/wd_WMAYP1690412
> > /dev/sdc1                   1,9T  762G    1,1T  41%
> > /mnt/ST2000DM001-1CH164_W1E2N5GM
> > [3] 23:50:37 [SUCCESS] sheepdog003
> > /dev/sda3       217G  146G     71G  68% /mnt/sheep/dsk01
> > /dev/sdb1       2,8T  1,1T    1,7T  40% /mnt/sheep/dsk02
> > /dev/sdc1       2,8T  1,5T    1,4T  52% /mnt/cubonas
> > [4] 23:50:37 [SUCCESS] sheepdog001
> > /dev/mapper/vg00-dati     192G  170G     12G  94% /mnt/dati
> > /dev/sdc1            1,9T  1,1T    768G  59% /mnt/ST2000DM001-1CH164_W1E2N5G6
> > 
> > Now I'm going to unplug  /mnt/wd_WMAYP1690412 but I fear other "small"
> > devices are going to fill up.
> > 
> > I can't follow the recovery because it's pretty late here now : /
> > 
> 
> Could you show 'md info --all' to see if some md devices are really full?
> 
> I can't reproduce this problem easily on my laptop, if there is problem,
> it is the problem of our hash function. Kazutaka, can you see to this
> problem on your test cluster? If there indeed is problem in sheep for

I couldn't reproduce the problem at all.

Valerio, can you make sure that /mnt/backup is dedicated to the sheep
daemon?  I suspect that you are putting some backup files to the
directory.  If there are other files which the sheep daemon is not
aware of, the disk can be full faster than the others.

Thanks,

Kazutaka



More information about the sheepdog-users mailing list