[Sheepdog] [RFC PATCH] sheep: add client side timeout support for socket

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Dec 6 08:59:59 CET 2011


At Mon, 5 Dec 2011 16:09:01 +0800,
Yibin Shen wrote:
> 
> 2011/11/27 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>:
> > At Sat, 26 Nov 2011 19:06:18 +0800,
> > Liu Yuan wrote:
> >>
> >> On 11/26/2011 05:53 PM, Yibin Shen wrote:
> >>
> >> > oops, I found a regression with this patch
> >> >
> >> > Nov 26 11:19:13 store_queue_request(936) 3, 3, 412ca6000022d8 , 10
> >> > Nov 26 11:19:13 forward_write_obj_req(368) 412ca6000022d8
> >> > Nov 26 11:19:13 store_queue_request_local(843) 3, 412ca6000022d8 , 10
> >> > Nov 26 11:19:43 store_queue_request(967) failed: 3, 3, 412ca6000022d8 , 10, 3
> >> > Nov 26 11:19:43 io_op_done(147) leaving sheepdog cluster
> >> > Nov 26 11:19:43 sd_leave_handler(1291) network partition bug: this
> >> > sheep should have exited
> >> > Nov 26 11:19:43 log_sigsegv(358) logger pid 9654 exiting abnormally
> >> >
> >> >
> >> > e.g :  if a object have 3 copies,  and is hashed to (local, node A, node B)
> >> > then in a write operation, if node A leave cluster,  IO towards node A
> >> > will timeout after 30sec,
> >> > but we use a strong consistency model, so return value of
> >> > store_request_queue will be set to SD_RES_EIO,
> >> > then io_op_done (sdnet.c) function will call leave_cluster .
> >> >
> >> > 144        } else if (is_access_local(req->entry, req->nr_vnodes,
> >> > 145                                   ((struct sd_obj_req
> >> > *)&req->rq)->oid, copies) &&
> >> > 146                   req->rp.result == SD_RES_EIO) {
> >> > 147                eprintf("leaving sheepdog cluster\n");
> >> > 148                leave_cluster();
> >> >
> >> > IMO, maybe we should:
> >> > 1)split store_request_queue() into multiple works.
> >> > 2)replace strong consistency with eventual consistency or casual consistency。
> >> >
> >> > any comments?
> >> >
> >> > thanks
> >>
> >>
> >> I think it is not the time to introduce other consistency models which
> >> bring in much complexity.
> >>
> >> Whatever consistency model you use, you still need to handle EIO. IMO,
> >
> > It is completely wrong to set SD_RES_EIO when timeout occurs because
> > the error means disk I/O errors.  We must set SD_RES_NETWORK_ERROR in
> > this case so that the request will be retried after epoch is updated.
> >
> Yes , it works.
> > But I guess it is better to enable TCP keepalive.  If we use it, the
> > connection will be closed after timeout automatically, so we don't
> > need to change network I/O code at all.
> >
> hmm, I don't think so, to solve this problem , we must enable client side
> tcp keepalive,  so we have to modify the network I/O code,

What we need to do is only set socket options in get_sheep_fd() and
listen_handler(), no?

>  also tcp keepalive will bring in overhead.

I think the overhead is small enough.

Thanks,

Kazutaka


> IMO, use poll plus timeout setting is the simplest solution
> 
> > Thanks,
> >
> > Kazutaka
> >
> >> you could handle EIO even with current strong model. In this case, A is
> >> gone, you could
> >>
> >> 1) timeout the write
> >> 2) wait for the cluster get recovered (get a new hash)
> >> 3) do the write again.
> >>
> >> The newest HEAD have already removed the lines that makes sheep
> >> panic-out in error case. So currently, EIO will leave the node a gateway
> >> for VMs. This is a acceptable compromise.
> >>
> >> Thanks,
> >> Yuan
> >>
> >> --
> >> sheepdog mailing list
> >> sheepdog at lists.wpkg.org
> >> http://lists.wpkg.org/mailman/listinfo/sheepdog
> > --
> > sheepdog mailing list
> > sheepdog at lists.wpkg.org
> > http://lists.wpkg.org/mailman/listinfo/sheepdog
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list