On 06/21/2012 04:12 PM, levin li wrote: > From: levin li <xingke.lwp at taobao.com> > > read_object() calls forward_read_obj_req() to get the object, > but may fail with result SD_RES_OLD_NODE_VER, read_object() do > nothing to handle this error, this patch fixed this problem > by sending a gateway request to local node, making gateway to > retry when error occur. > > Signed-off-by: levin li <xingke.lwp at taobao.com> > --- > sheep/store.c | 47 +++++++++++++++++++++++++++++++---------------- > 1 file changed, 31 insertions(+), 16 deletions(-) > > diff --git a/sheep/store.c b/sheep/store.c > index 52c4716..aae322b 100644 > --- a/sheep/store.c > +++ b/sheep/store.c > @@ -558,9 +558,11 @@ int read_object(struct vnode_info *vnodes, uint32_t epoch, > uint64_t oid, char *data, unsigned int datalen, > uint64_t offset, int nr_copies) > { > - struct request read_req; > - struct sd_req *hdr = &read_req.rq; > - int ret; > + struct sd_req hdr; > + struct sd_rsp *rsp = (struct sd_rsp *)&hdr; > + char host[128]; > + unsigned int wlen = 0, rlen = datalen; > + int fd, ret; > > if (sys->enable_write_cache && object_is_cached(oid)) { > ret = object_cache_read(oid, data, datalen, offset, > @@ -572,23 +574,36 @@ int read_object(struct vnode_info *vnodes, uint32_t epoch, > } > return ret; > } > - memset(&read_req, 0, sizeof(read_req)); > + > forward_read: > - hdr->opcode = SD_OP_READ_OBJ; > - hdr->data_length = datalen; > - hdr->epoch = epoch; > + addr_to_str(host, sizeof(host), sys->this_node.addr, 0); > + fd = connect_to(host, sys->this_node.port); > + if (fd < 0) { > + dprintf("Failed to connect to local node\n"); > + return SD_RES_EIO; Be careful to return SD_RES_EIO, return network err instead for this case. > + } > > - hdr->obj.oid = oid; > - hdr->obj.offset = offset; > - hdr->obj.copies = nr_copies; > + hdr.opcode = SD_OP_READ_OBJ; > + hdr.data_length = datalen; > + hdr.epoch = epoch; > > - read_req.data = data; > - read_req.op = get_sd_op(hdr->opcode); > - read_req.vnodes = vnodes; > + hdr.obj.oid = oid; > + hdr.obj.offset = offset; > + hdr.obj.copies = nr_copies; > > - ret = forward_read_obj_req(&read_req); > - if (ret != SD_RES_SUCCESS) > - eprintf("failed to forward read object %x\n", ret); > + ret = exec_req(fd, &hdr, data, &wlen, &rlen); > + close(fd); > + > + if (ret) { > + dprintf("Failed to read object %" PRIx64 "\n", oid); > + return SD_RES_EIO; return SD_NETWORK_ERR > + } > + > + if (rsp->result != SD_RES_SUCCESS) { > + dprintf("Failed to read object %" PRIx64 " %s\n", oid, > + sd_strerror(rsp->result)); > + return rsp->result; > + } > > return ret; > } > Seems that write/remove_object() and object cache push (which is more fatal) could fail without any retry handling. Those places also need this retry mechanism. By looking more at the code, I think we also need a forward_remove_obj_req() function too. Then forward_write/read/remove_obj_req() can be used as gateway requests which handles retry internally. Also we'd better off cache a unix domain fd(s) to local node in 'sys'( in order to accelerate the connection of this kind. Thanks, Yuan |