[sheepdog] [PATCH 1/2] sheep: fix hang when IO NIC is down only
Liu Yuan
namei.unix at gmail.com
Tue Jan 15 12:41:09 CET 2013
From: Liu Yuan <tailai.ly at taobao.com>
If IO NIC is down but sheep alive, epoch isn't incremented, so we can't retry
poll for ever.
This problem can be demonstrated by 050.
Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
sheep/gateway.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/sheep/gateway.c b/sheep/gateway.c
index 2bd6e78..0870bc2 100644
--- a/sheep/gateway.c
+++ b/sheep/gateway.c
@@ -149,7 +149,7 @@ static inline void pfd_info_init(struct write_info *wi, struct pfd_info *pi)
*/
static int wait_forward_request(struct write_info *wi, struct request *req)
{
- int nr_sent, err_ret = SD_RES_SUCCESS, ret, pollret, i;
+ int nr_sent, err_ret = SD_RES_SUCCESS, ret, pollret, i, repeat = 6;
struct pfd_info pi;
struct sd_rsp *rsp = &req->rp;
again:
@@ -163,8 +163,14 @@ again:
} else if (pollret == 0) {
eprintf("poll timeout %d\n", wi->nr_sent);
- if (req->rq.epoch == sys_epoch())
+ /*
+ * If IO NIC is down, epoch isn't incremented, so we can't retry
+ * for ever.
+ */
+ if (req->rq.epoch == sys_epoch() && repeat) {
+ repeat--;
goto again;
+ }
nr_sent = wi->nr_sent;
/* XXX Blinedly close all the connections */
--
1.7.9.5
More information about the sheepdog
mailing list