[sheepdog] [PATCH] sheep: check resource limit at startup

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Jan 29 05:56:53 CET 2013


At Tue, 29 Jan 2013 12:35:13 +0800,
Liu Yuan wrote:
> 
> On 01/29/2013 12:19 PM, MORITA Kazutaka wrote:
> > Another option is to handle the EMFILE error gracefully (e.g. retrying
> > after some file descriptors are closed) so that VMs aren't aware of
> > sheepdog internal error.
> 
> This is the hardest part. Object cache need eventfd, local rqeuest need
> eventfd, and people don't want to sheep get out of service just because
> of lack of FD temporarily.
> 
> Actually, sockfd cache can survive the FD outage, it just returns error
> to the caller, it is the very sheep internal logics can't handle EMPFILE.

I've confirmed before that sheep failed to open a object file because
of no free file descriptors, which leaded to EIO to running VMs.

If it's difficult to limit or reduce the number of consuming FDs,
let's find the safe value for RLIMIT_NOFILE we should suggest.  I
think the number highly depends on the number of sheep nodes and
running VMs.

Thanks,

Kazutaka



More information about the sheepdog mailing list