[Sheepdog] [RFC PATCH] sheep: add client side timeout support for socket

Liu Yuan namei.unix at gmail.com
Sat Nov 26 12:06:18 CET 2011


On 11/26/2011 05:53 PM, Yibin Shen wrote:

> oops, I found a regression with this patch
> 
> Nov 26 11:19:13 store_queue_request(936) 3, 3, 412ca6000022d8 , 10
> Nov 26 11:19:13 forward_write_obj_req(368) 412ca6000022d8
> Nov 26 11:19:13 store_queue_request_local(843) 3, 412ca6000022d8 , 10
> Nov 26 11:19:43 store_queue_request(967) failed: 3, 3, 412ca6000022d8 , 10, 3
> Nov 26 11:19:43 io_op_done(147) leaving sheepdog cluster
> Nov 26 11:19:43 sd_leave_handler(1291) network partition bug: this
> sheep should have exited
> Nov 26 11:19:43 log_sigsegv(358) logger pid 9654 exiting abnormally
> 
> 
> e.g :  if a object have 3 copies,  and is hashed to (local, node A, node B)
> then in a write operation, if node A leave cluster,  IO towards node A
> will timeout after 30sec,
> but we use a strong consistency model, so return value of
> store_request_queue will be set to SD_RES_EIO,
> then io_op_done (sdnet.c) function will call leave_cluster .
> 
> 144        } else if (is_access_local(req->entry, req->nr_vnodes,
> 145                                   ((struct sd_obj_req
> *)&req->rq)->oid, copies) &&
> 146                   req->rp.result == SD_RES_EIO) {
> 147                eprintf("leaving sheepdog cluster\n");
> 148                leave_cluster();
> 
> IMO, maybe we should:
> 1)split store_request_queue() into multiple works.
> 2)replace strong consistency with eventual consistency or casual consistency。
> 
> any comments?
> 
> thanks


I think it is not the time to introduce other consistency models which
bring in much complexity.

Whatever consistency model you use, you still need to handle EIO. IMO,
you could handle EIO even with current strong model. In this case, A is
gone, you could

1) timeout the write
2) wait for the cluster get recovered (get a new hash)
3) do the write again.

The newest HEAD have already removed the lines that makes sheep
panic-out in error case. So currently, EIO will leave the node a gateway
for VMs. This is a acceptable compromise.

Thanks,
Yuan




More information about the sheepdog mailing list