[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE

Liu Yuan namei.unix at gmail.com
Thu Jul 18 04:25:03 CEST 2013


On Thu, Jul 18, 2013 at 12:54:20AM +0900, Hitoshi Mitake wrote:
> At Thu, 18 Jul 2013 00:47:42 +0900,
> Hitoshi Mitake wrote:
> > 
> > At Wed, 17 Jul 2013 16:57:02 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > > This patch adds calling of shrink_sockfd() after connect() and
> > > > accept() when they return EMFILE.
> > > > 
> > > > These retries can be invoked twice at a maximum. This policy must be
> > > > improved in the future.
> > > > 
> > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > ---
> > > >  sheep/request.c      |    7 +++++++
> > > >  sheep/sockfd_cache.c |    6 ++++++
> > > >  2 files changed, 13 insertions(+)
> > > > 
> > > > diff --git a/sheep/request.c b/sheep/request.c
> > > > index 3b43c76..8f3f840 100644
> > > > --- a/sheep/request.c
> > > > +++ b/sheep/request.c
> > > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > > >  	}
> > > >  
> > > >  	namesize = sizeof(from);
> > > > +
> > > > +	int retry = 2;
> > > > +retry_accept:
> > > >  	fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > > >  	if (fd < 0) {
> > > > +		if (errno == EMFILE && retry--) {
> > > > +			if (shrink_sockfd())
> > > > +				goto retry_accept;
> > > > +		}
> > > >  		sd_eprintf("failed to accept a new connection: %m");
> > > >  		return;
> > > >  	}
> > > > diff --git a/sheep/sockfd_cache.c b/sheep/sockfd_cache.c
> > > > index 37a6a5e..ffab647 100644
> > > > --- a/sheep/sockfd_cache.c
> > > > +++ b/sheep/sockfd_cache.c
> > > > @@ -390,9 +390,15 @@ grab:
> > > >  	}
> > > >  
> > > >  	/* Create a new cached connection for this node */
> > > > +	int retry = 2;
> > > > +retry_connect:
> > > >  	sd_dprintf("create cache connection %s:%d idx %d", name, port, idx);
> > > >  	fd = connect_to(name, port);
> > > >  	if (fd < 0) {
> > > > +		if (errno == EMFILE && retry--) {
> > > > +			if (shrink_sockfd())
> > > > +				goto retry_connect;
> > > > +		}
> > > 
> > > Teach connect_to to retry internally.
> > 
> > connect_to() is in libsheepdog so we can't let it retry because
> > libsheepdog can't use sockfd.
> 
> On the second thought, moving sockfd from sheep to libsheepdog would
> be benefitical for collie. Because some subcommands of collie calls
> collie_exec_req() many times and connecting to sheep each time
> produces overhead. e.g. on an sheepdog cluster which has few thousands
> of vdis, collie vdi list takes more than ten seconds. The overhead
> cannot be ignored.
> 
> How do you think about this idea? If it is acceptable, I'll do this in
> v2.
 
Please do it as separate patch set before this patch set. This looks not an easy
job because some node management code and IO NIC support of sockfd cache isn't
needed at all by libsheepdog. After stripping out these stuff, I guess there
isn't much shared code left.

Thanks
Yuan



More information about the sheepdog mailing list