[sheepdog] [PATCH stable-0.8] sheep: fix sheep sometimes cannot be killed successfully

Hitoshi Mitake mitake.hitoshi at lab.ntt.co.jp
Thu Aug 14 08:17:11 CEST 2014


From: Ruoyu <liangry at ucweb.com>

I am sure this is a bug because the variable nr_outstanding_reqs
does not reset to zero sometimes after client cancelling the
request. Once nr_outstanding_reqs is not zero, sheep process
never be killed successfully, neither using kill <pid> nor
using dog node kill command.

But I am not sure whether the bug is fixed perfectly because
I am not familiar with the sheepdog networking logic. I have to
add some error messages to every doutful statements.

The result is, I caught one of them. So, I call the function
clear_client_info in that place. It seems every thing is fine
after the modification.

Does anyone help to investigate and fix it?

Signed-off-by: Ruoyu <liangry at ucweb.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
---
 sheep/request.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/sheep/request.c b/sheep/request.c
index 8a71dc2..11a593d 100644
--- a/sheep/request.c
+++ b/sheep/request.c
@@ -710,7 +710,16 @@ main_fn void put_request(struct request *req)
 
 			if (ci->tx_req == NULL)
 				/* There is no request being sent. */
-				conn_tx_on(&ci->conn);
+				if (conn_tx_on(&ci->conn)) {
+					sd_err("switch on sending flag failure, "
+						"connection maybe closed");
+					/*
+					 * should not free_request(req) here
+					 * because it is already in done list
+					 * clear_client_info will free it
+					 */
+					clear_client_info(ci);
+				}
 		}
 	}
 }
@@ -770,7 +779,9 @@ static void rx_main(struct work *work)
 		return;
 	}
 
-	conn_rx_on(&ci->conn);
+	if (conn_rx_on(&ci->conn))
+		sd_err("switch on receiving flag failure, "
+				"connection maybe closed");
 
 	if (is_logging_op(get_sd_op(req->rq.opcode))) {
 		sd_info("req=%p, fd=%d, client=%s:%d, op=%s, data=%s",
@@ -846,7 +857,9 @@ static void tx_main(struct work *work)
 	}
 
 	if (!list_empty(&ci->done_reqs))
-		conn_tx_on(&ci->conn);
+		if (conn_tx_on(&ci->conn))
+			sd_err("switch on sending flag failure, "
+					"connection maybe closed");
 }
 
 static void destroy_client(struct client_info *ci)
@@ -932,8 +945,11 @@ static void client_handler(int fd, int events, void *data)
 		return clear_client_info(ci);
 
 	if (events & EPOLLIN) {
-		if (conn_rx_off(&ci->conn) != 0)
+		if (conn_rx_off(&ci->conn) != 0) {
+			sd_err("switch off receiving flag failure, "
+					"connection maybe closed");
 			return;
+		}
 
 		/*
 		 * Increment refcnt so that the client_info isn't freed while
@@ -946,8 +962,11 @@ static void client_handler(int fd, int events, void *data)
 	}
 
 	if (events & EPOLLOUT) {
-		if (conn_tx_off(&ci->conn) != 0)
+		if (conn_tx_off(&ci->conn) != 0) {
+			sd_err("switch off sending flag failure, "
+					"connection maybe closed");
 			return;
+		}
 
 		assert(ci->tx_req == NULL);
 		ci->tx_req = list_first_entry(&ci->done_reqs, struct request,
-- 
1.8.3.2




More information about the sheepdog mailing list