[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE

Hitoshi Mitake mitake.hitoshi at gmail.com
Thu Jul 18 10:47:16 CEST 2013


At Thu, 18 Jul 2013 13:36:30 +0800,
Liu Yuan wrote:
> 
> On Thu, Jul 18, 2013 at 02:33:20PM +0900, Hitoshi Mitake wrote:
> > At Thu, 18 Jul 2013 10:26:22 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Thu, Jul 18, 2013 at 12:46:24AM +0900, Hitoshi Mitake wrote:
> > > > At Wed, 17 Jul 2013 16:56:10 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > > > > This patch adds calling of shrink_sockfd() after connect() and
> > > > > > accept() when they return EMFILE.
> > > > > > 
> > > > > > These retries can be invoked twice at a maximum. This policy must be
> > > > > > improved in the future.
> > > > > > 
> > > > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > > ---
> > > > > >  sheep/request.c      |    7 +++++++
> > > > > >  sheep/sockfd_cache.c |    6 ++++++
> > > > > >  2 files changed, 13 insertions(+)
> > > > > > 
> > > > > > diff --git a/sheep/request.c b/sheep/request.c
> > > > > > index 3b43c76..8f3f840 100644
> > > > > > --- a/sheep/request.c
> > > > > > +++ b/sheep/request.c
> > > > > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > > > > >  	}
> > > > > >  
> > > > > >  	namesize = sizeof(from);
> > > > > > +
> > > > > > +	int retry = 2;
> > > > > > +retry_accept:
> > > > > >  	fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > > > > >  	if (fd < 0) {
> > > > > > +		if (errno == EMFILE && retry--) {
> > > > > > +			if (shrink_sockfd())
> > > > > > +				goto retry_accept;
> > > > > > +		}
> > > > > >  		sd_eprintf("failed to accept a new connection: %m");
> > > > > >  		return;
> > > > > 
> > > > > Better use xaccept() to handle retry accept internally and document why 2.
> > > > 
> > > > I don't have the reason of 2, currently. I'd like to profile sheep
> > > > with tools like LTTng or perf and seek better retry count later.
> > > 
> > > No, I think we should try forever for EMFILE until success.
> > 
> > Retrying forever is too agressive. Because EMFILE can be caused when
> > opened files are too many, in theory. I think a threshold is required.
> > 
> 
> Why we need a threshold? I don't see the point. Please explain your theory.
> 

In theory, other subsystems than sockfd can exhaust file
descriptors. But... it would be a case which we don't have to
consider, sorry.

How about this: basically, wrappers like xopen() retries until it
succeeds. When not used cached fd of sockfd is empty, they return
EMFILE.

> > > 
> > > > 
> > > > We shouldn't use x prefix for retrying versions of socket producing
> > > > functions. Because many x prefixed functions are in libsheepdog and
> > > > the retrying functions cannot be moved to libsheepdog. They depends on
> > > > sockfd.
> > > 
> > > Then sd_open() is much better name.
> > 
> > I'll move sockfd to libsheepdog. So using x prefix is okay.
> 
> Perhaps, sockfd cache can't be used by others because of IO NIC code and node
> management code.

It would be difficult, as you say. If it is impossible, I'll implement
a minimal sockfd caching mechanism for collie.

BTW, I think employing unix domain socket between collie and sheep
would be benefitical when collie issues many requests. How do you
think?

Thanks,
Hitoshi



More information about the sheepdog mailing list