[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE

Hitoshi Mitake mitake.hitoshi at gmail.com
Wed Jul 17 17:54:20 CEST 2013


At Thu, 18 Jul 2013 00:47:42 +0900,
Hitoshi Mitake wrote:
> 
> At Wed, 17 Jul 2013 16:57:02 +0800,
> Liu Yuan wrote:
> > 
> > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > This patch adds calling of shrink_sockfd() after connect() and
> > > accept() when they return EMFILE.
> > > 
> > > These retries can be invoked twice at a maximum. This policy must be
> > > improved in the future.
> > > 
> > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > ---
> > >  sheep/request.c      |    7 +++++++
> > >  sheep/sockfd_cache.c |    6 ++++++
> > >  2 files changed, 13 insertions(+)
> > > 
> > > diff --git a/sheep/request.c b/sheep/request.c
> > > index 3b43c76..8f3f840 100644
> > > --- a/sheep/request.c
> > > +++ b/sheep/request.c
> > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > >  	}
> > >  
> > >  	namesize = sizeof(from);
> > > +
> > > +	int retry = 2;
> > > +retry_accept:
> > >  	fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > >  	if (fd < 0) {
> > > +		if (errno == EMFILE && retry--) {
> > > +			if (shrink_sockfd())
> > > +				goto retry_accept;
> > > +		}
> > >  		sd_eprintf("failed to accept a new connection: %m");
> > >  		return;
> > >  	}
> > > diff --git a/sheep/sockfd_cache.c b/sheep/sockfd_cache.c
> > > index 37a6a5e..ffab647 100644
> > > --- a/sheep/sockfd_cache.c
> > > +++ b/sheep/sockfd_cache.c
> > > @@ -390,9 +390,15 @@ grab:
> > >  	}
> > >  
> > >  	/* Create a new cached connection for this node */
> > > +	int retry = 2;
> > > +retry_connect:
> > >  	sd_dprintf("create cache connection %s:%d idx %d", name, port, idx);
> > >  	fd = connect_to(name, port);
> > >  	if (fd < 0) {
> > > +		if (errno == EMFILE && retry--) {
> > > +			if (shrink_sockfd())
> > > +				goto retry_connect;
> > > +		}
> > 
> > Teach connect_to to retry internally.
> 
> connect_to() is in libsheepdog so we can't let it retry because
> libsheepdog can't use sockfd.

On the second thought, moving sockfd from sheep to libsheepdog would
be benefitical for collie. Because some subcommands of collie calls
collie_exec_req() many times and connecting to sheep each time
produces overhead. e.g. on an sheepdog cluster which has few thousands
of vdis, collie vdi list takes more than ten seconds. The overhead
cannot be ignored.

How do you think about this idea? If it is acceptable, I'll do this in
v2.

Thanks,
Hitoshi




More information about the sheepdog mailing list