[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE

Hitoshi Mitake mitake.hitoshi at gmail.com
Thu Jul 18 07:25:40 CEST 2013


At Thu, 18 Jul 2013 10:25:03 +0800,
Liu Yuan wrote:
> 
> On Thu, Jul 18, 2013 at 12:54:20AM +0900, Hitoshi Mitake wrote:
> > At Thu, 18 Jul 2013 00:47:42 +0900,
> > Hitoshi Mitake wrote:
> > > 
> > > At Wed, 17 Jul 2013 16:57:02 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > > > This patch adds calling of shrink_sockfd() after connect() and
> > > > > accept() when they return EMFILE.
> > > > > 
> > > > > These retries can be invoked twice at a maximum. This policy must be
> > > > > improved in the future.
> > > > > 
> > > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > ---
> > > > >  sheep/request.c      |    7 +++++++
> > > > >  sheep/sockfd_cache.c |    6 ++++++
> > > > >  2 files changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/sheep/request.c b/sheep/request.c
> > > > > index 3b43c76..8f3f840 100644
> > > > > --- a/sheep/request.c
> > > > > +++ b/sheep/request.c
> > > > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > > > >  	}
> > > > >  
> > > > >  	namesize = sizeof(from);
> > > > > +
> > > > > +	int retry = 2;
> > > > > +retry_accept:
> > > > >  	fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > > > >  	if (fd < 0) {
> > > > > +		if (errno == EMFILE && retry--) {
> > > > > +			if (shrink_sockfd())
> > > > > +				goto retry_accept;
> > > > > +		}
> > > > >  		sd_eprintf("failed to accept a new connection: %m");
> > > > >  		return;
> > > > >  	}
> > > > > diff --git a/sheep/sockfd_cache.c b/sheep/sockfd_cache.c
> > > > > index 37a6a5e..ffab647 100644
> > > > > --- a/sheep/sockfd_cache.c
> > > > > +++ b/sheep/sockfd_cache.c
> > > > > @@ -390,9 +390,15 @@ grab:
> > > > >  	}
> > > > >  
> > > > >  	/* Create a new cached connection for this node */
> > > > > +	int retry = 2;
> > > > > +retry_connect:
> > > > >  	sd_dprintf("create cache connection %s:%d idx %d", name, port, idx);
> > > > >  	fd = connect_to(name, port);
> > > > >  	if (fd < 0) {
> > > > > +		if (errno == EMFILE && retry--) {
> > > > > +			if (shrink_sockfd())
> > > > > +				goto retry_connect;
> > > > > +		}
> > > > 
> > > > Teach connect_to to retry internally.
> > > 
> > > connect_to() is in libsheepdog so we can't let it retry because
> > > libsheepdog can't use sockfd.
> > 
> > On the second thought, moving sockfd from sheep to libsheepdog would
> > be benefitical for collie. Because some subcommands of collie calls
> > collie_exec_req() many times and connecting to sheep each time
> > produces overhead. e.g. on an sheepdog cluster which has few thousands
> > of vdis, collie vdi list takes more than ten seconds. The overhead
> > cannot be ignored.
> > 
> > How do you think about this idea? If it is acceptable, I'll do this in
> > v2.
>  
> Please do it as separate patch set before this patch set. This looks not an easy
> job because some node management code and IO NIC support of sockfd cache isn't
> needed at all by libsheepdog. After stripping out these stuff, I guess there
> isn't much shared code left.

OK, I'll do this in a separated patchset.

Thanks,
Hitoshi



More information about the sheepdog mailing list