[sheepdog] [PATCH RFC 5/5] sheep: retry when connect() or accept() fails with EMFILE
Hitoshi Mitake
mitake.hitoshi at gmail.com
Thu Jul 18 07:25:40 CEST 2013
At Thu, 18 Jul 2013 10:25:03 +0800,
Liu Yuan wrote:
>
> On Thu, Jul 18, 2013 at 12:54:20AM +0900, Hitoshi Mitake wrote:
> > At Thu, 18 Jul 2013 00:47:42 +0900,
> > Hitoshi Mitake wrote:
> > >
> > > At Wed, 17 Jul 2013 16:57:02 +0800,
> > > Liu Yuan wrote:
> > > >
> > > > On Fri, Jul 12, 2013 at 10:54:26AM +0900, Hitoshi Mitake wrote:
> > > > > This patch adds calling of shrink_sockfd() after connect() and
> > > > > accept() when they return EMFILE.
> > > > >
> > > > > These retries can be invoked twice at a maximum. This policy must be
> > > > > improved in the future.
> > > > >
> > > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > ---
> > > > > sheep/request.c | 7 +++++++
> > > > > sheep/sockfd_cache.c | 6 ++++++
> > > > > 2 files changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/sheep/request.c b/sheep/request.c
> > > > > index 3b43c76..8f3f840 100644
> > > > > --- a/sheep/request.c
> > > > > +++ b/sheep/request.c
> > > > > @@ -828,8 +828,15 @@ static void listen_handler(int listen_fd, int events, void *data)
> > > > > }
> > > > >
> > > > > namesize = sizeof(from);
> > > > > +
> > > > > + int retry = 2;
> > > > > +retry_accept:
> > > > > fd = accept(listen_fd, (struct sockaddr *)&from, &namesize);
> > > > > if (fd < 0) {
> > > > > + if (errno == EMFILE && retry--) {
> > > > > + if (shrink_sockfd())
> > > > > + goto retry_accept;
> > > > > + }
> > > > > sd_eprintf("failed to accept a new connection: %m");
> > > > > return;
> > > > > }
> > > > > diff --git a/sheep/sockfd_cache.c b/sheep/sockfd_cache.c
> > > > > index 37a6a5e..ffab647 100644
> > > > > --- a/sheep/sockfd_cache.c
> > > > > +++ b/sheep/sockfd_cache.c
> > > > > @@ -390,9 +390,15 @@ grab:
> > > > > }
> > > > >
> > > > > /* Create a new cached connection for this node */
> > > > > + int retry = 2;
> > > > > +retry_connect:
> > > > > sd_dprintf("create cache connection %s:%d idx %d", name, port, idx);
> > > > > fd = connect_to(name, port);
> > > > > if (fd < 0) {
> > > > > + if (errno == EMFILE && retry--) {
> > > > > + if (shrink_sockfd())
> > > > > + goto retry_connect;
> > > > > + }
> > > >
> > > > Teach connect_to to retry internally.
> > >
> > > connect_to() is in libsheepdog so we can't let it retry because
> > > libsheepdog can't use sockfd.
> >
> > On the second thought, moving sockfd from sheep to libsheepdog would
> > be benefitical for collie. Because some subcommands of collie calls
> > collie_exec_req() many times and connecting to sheep each time
> > produces overhead. e.g. on an sheepdog cluster which has few thousands
> > of vdis, collie vdi list takes more than ten seconds. The overhead
> > cannot be ignored.
> >
> > How do you think about this idea? If it is acceptable, I'll do this in
> > v2.
>
> Please do it as separate patch set before this patch set. This looks not an easy
> job because some node management code and IO NIC support of sockfd cache isn't
> needed at all by libsheepdog. After stripping out these stuff, I guess there
> isn't much shared code left.
OK, I'll do this in a separated patchset.
Thanks,
Hitoshi
More information about the sheepdog
mailing list