[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE

Liu Yuan namei.unix at gmail.com
Thu Jul 18 07:36:30 CEST 2013


On Thu, Jul 18, 2013 at 02:33:20PM +0900, Hitoshi Mitake wrote:
> At Thu, 18 Jul 2013 10:26:22 +0800,
> Liu Yuan wrote:
> > 
> > On Thu, Jul 18, 2013 at 12:46:24AM +0900, Hitoshi Mitake wrote:
> > > At Wed, 17 Jul 2013 16:56:10 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > > > This patch adds calling of shrink_sockfd() after connect() and
> > > > > accept() when they return EMFILE.
> > > > > 
> > > > > These retries can be invoked twice at a maximum. This policy must be
> > > > > improved in the future.
> > > > > 
> > > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > ---
> > > > >  sheep/request.c      |    7 +++++++
> > > > >  sheep/sockfd_cache.c |    6 ++++++
> > > > >  2 files changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/sheep/request.c b/sheep/request.c
> > > > > index 3b43c76..8f3f840 100644
> > > > > --- a/sheep/request.c
> > > > > +++ b/sheep/request.c
> > > > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > > > >  	}
> > > > >  
> > > > >  	namesize = sizeof(from);
> > > > > +
> > > > > +	int retry = 2;
> > > > > +retry_accept:
> > > > >  	fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > > > >  	if (fd < 0) {
> > > > > +		if (errno == EMFILE && retry--) {
> > > > > +			if (shrink_sockfd())
> > > > > +				goto retry_accept;
> > > > > +		}
> > > > >  		sd_eprintf("failed to accept a new connection: %m");
> > > > >  		return;
> > > > 
> > > > Better use xaccept() to handle retry accept internally and document why 2.
> > > 
> > > I don't have the reason of 2, currently. I'd like to profile sheep
> > > with tools like LTTng or perf and seek better retry count later.
> > 
> > No, I think we should try forever for EMFILE until success.
> 
> Retrying forever is too agressive. Because EMFILE can be caused when
> opened files are too many, in theory. I think a threshold is required.
> 

Why we need a threshold? I don't see the point. Please explain your theory.

> > 
> > > 
> > > We shouldn't use x prefix for retrying versions of socket producing
> > > functions. Because many x prefixed functions are in libsheepdog and
> > > the retrying functions cannot be moved to libsheepdog. They depends on
> > > sockfd.
> > 
> > Then sd_open() is much better name.
> 
> I'll move sockfd to libsheepdog. So using x prefix is okay.

Perhaps, sockfd cache can't be used by others because of IO NIC code and node
management code.

Thanks
Yuan



More information about the sheepdog mailing list