[sheepdog] [PATCH 1/2] sheep: fix hang when IO NIC is down only

Liu Yuan namei.unix at gmail.com
Tue Jan 15 12:41:09 CET 2013


From: Liu Yuan <tailai.ly at taobao.com>

If IO NIC is down but sheep alive, epoch isn't incremented, so we can't retry
poll for ever.

This problem can be demonstrated by 050.

Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
 sheep/gateway.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/sheep/gateway.c b/sheep/gateway.c
index 2bd6e78..0870bc2 100644
--- a/sheep/gateway.c
+++ b/sheep/gateway.c
@@ -149,7 +149,7 @@ static inline void pfd_info_init(struct write_info *wi, struct pfd_info *pi)
  */
 static int wait_forward_request(struct write_info *wi, struct request *req)
 {
-	int nr_sent, err_ret = SD_RES_SUCCESS, ret, pollret, i;
+	int nr_sent, err_ret = SD_RES_SUCCESS, ret, pollret, i, repeat = 6;
 	struct pfd_info pi;
 	struct sd_rsp *rsp = &req->rp;
 again:
@@ -163,8 +163,14 @@ again:
 	} else if (pollret == 0) {
 		eprintf("poll timeout %d\n", wi->nr_sent);
 
-		if (req->rq.epoch == sys_epoch())
+		/*
+		 * If IO NIC is down, epoch isn't incremented, so we can't retry
+		 * for ever.
+		 */
+		if (req->rq.epoch == sys_epoch() && repeat) {
+			repeat--;
 			goto again;
+		}
 
 		nr_sent = wi->nr_sent;
 		/* XXX Blinedly close all the connections */
-- 
1.7.9.5




More information about the sheepdog mailing list