At Mon, 5 Dec 2011 16:09:01 +0800, Yibin Shen wrote: > > 2011/11/27 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>: > > At Sat, 26 Nov 2011 19:06:18 +0800, > > Liu Yuan wrote: > >> > >> On 11/26/2011 05:53 PM, Yibin Shen wrote: > >> > >> > oops, I found a regression with this patch > >> > > >> > Nov 26 11:19:13 store_queue_request(936) 3, 3, 412ca6000022d8 , 10 > >> > Nov 26 11:19:13 forward_write_obj_req(368) 412ca6000022d8 > >> > Nov 26 11:19:13 store_queue_request_local(843) 3, 412ca6000022d8 , 10 > >> > Nov 26 11:19:43 store_queue_request(967) failed: 3, 3, 412ca6000022d8 , 10, 3 > >> > Nov 26 11:19:43 io_op_done(147) leaving sheepdog cluster > >> > Nov 26 11:19:43 sd_leave_handler(1291) network partition bug: this > >> > sheep should have exited > >> > Nov 26 11:19:43 log_sigsegv(358) logger pid 9654 exiting abnormally > >> > > >> > > >> > e.g : if a object have 3 copies, and is hashed to (local, node A, node B) > >> > then in a write operation, if node A leave cluster, IO towards node A > >> > will timeout after 30sec, > >> > but we use a strong consistency model, so return value of > >> > store_request_queue will be set to SD_RES_EIO, > >> > then io_op_done (sdnet.c) function will call leave_cluster . > >> > > >> > 144 } else if (is_access_local(req->entry, req->nr_vnodes, > >> > 145 ((struct sd_obj_req > >> > *)&req->rq)->oid, copies) && > >> > 146 req->rp.result == SD_RES_EIO) { > >> > 147 eprintf("leaving sheepdog cluster\n"); > >> > 148 leave_cluster(); > >> > > >> > IMO, maybe we should: > >> > 1)split store_request_queue() into multiple works. > >> > 2)replace strong consistency with eventual consistency or casual consistency。 > >> > > >> > any comments? > >> > > >> > thanks > >> > >> > >> I think it is not the time to introduce other consistency models which > >> bring in much complexity. > >> > >> Whatever consistency model you use, you still need to handle EIO. IMO, > > > > It is completely wrong to set SD_RES_EIO when timeout occurs because > > the error means disk I/O errors. We must set SD_RES_NETWORK_ERROR in > > this case so that the request will be retried after epoch is updated. > > > Yes , it works. > > But I guess it is better to enable TCP keepalive. If we use it, the > > connection will be closed after timeout automatically, so we don't > > need to change network I/O code at all. > > > hmm, I don't think so, to solve this problem , we must enable client side > tcp keepalive, so we have to modify the network I/O code, What we need to do is only set socket options in get_sheep_fd() and listen_handler(), no? > also tcp keepalive will bring in overhead. I think the overhead is small enough. Thanks, Kazutaka > IMO, use poll plus timeout setting is the simplest solution > > > Thanks, > > > > Kazutaka > > > >> you could handle EIO even with current strong model. In this case, A is > >> gone, you could > >> > >> 1) timeout the write > >> 2) wait for the cluster get recovered (get a new hash) > >> 3) do the write again. > >> > >> The newest HEAD have already removed the lines that makes sheep > >> panic-out in error case. So currently, EIO will leave the node a gateway > >> for VMs. This is a acceptable compromise. > >> > >> Thanks, > >> Yuan > >> > >> -- > >> sheepdog mailing list > >> sheepdog at lists.wpkg.org > >> http://lists.wpkg.org/mailman/listinfo/sheepdog > > -- > > sheepdog mailing list > > sheepdog at lists.wpkg.org > > http://lists.wpkg.org/mailman/listinfo/sheepdog > -- > sheepdog mailing list > sheepdog at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog |