[sheepdog] [PATCH] sheep: fix sheep sometimes cannot be killed successfully

Hitoshi Mitake mitake.hitoshi at gmail.com
Mon Jul 14 14:25:22 CEST 2014


At Fri, 11 Jul 2014 14:29:46 +0800,
Ruoyu wrote:
> 
> I am sure this is a bug because the variable nr_outstanding_reqs
> does not reset to zero sometimes after client cancelling the
> request. Once nr_outstanding_reqs is not zero, sheep process
> never be killed successfully, neither using kill <pid> nor
> using dog node kill command.
> 
> But I am not sure whether the bug is fixed perfectly because
> I am not familiar with the sheepdog networking logic. I have to
> add some error messages to every doutful statements.
> 
> The result is, I caught one of them. So, I call the function
> clear_client_info in that place. It seems every thing is fine
> after the modification.
> 
> Does anyone help to investigate and fix it?
> 
> Signed-off-by: Ruoyu <liangry at ucweb.com>
> ---
>  sheep/request.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)

Applied, thanks.
Hitoshi

> 
> diff --git a/sheep/request.c b/sheep/request.c
> index eb72b00..2a24b93 100644
> --- a/sheep/request.c
> +++ b/sheep/request.c
> @@ -736,7 +736,16 @@ main_fn void put_request(struct request *req)
>  
>  			if (ci->tx_req == NULL)
>  				/* There is no request being sent. */
> -				conn_tx_on(&ci->conn);
> +				if (conn_tx_on(&ci->conn)) {
> +					sd_err("switch on sending flag failure, "
> +						"connection maybe closed");
> +					/*
> +					 * should not free_request(req) here
> +					 * because it is already in done list
> +					 * clear_client_info will free it
> +					 */
> +					clear_client_info(ci);
> +				}
>  		}
>  	}
>  }
> @@ -801,7 +810,9 @@ static void rx_main(struct work *work)
>  		return;
>  	}
>  
> -	conn_rx_on(&ci->conn);
> +	if (conn_rx_on(&ci->conn))
> +		sd_err("switch on receiving flag failure, "
> +				"connection maybe closed");
>  
>  	if (is_logging_op(get_sd_op(req->rq.opcode))) {
>  		sd_info("req=%p, fd=%d, client=%s:%d, op=%s, data=%s",
> @@ -878,7 +889,9 @@ static void tx_main(struct work *work)
>  	}
>  
>  	if (!list_empty(&ci->done_reqs))
> -		conn_tx_on(&ci->conn);
> +		if (conn_tx_on(&ci->conn))
> +			sd_err("switch on sending flag failure, "
> +					"connection maybe closed");
>  }
>  
>  static void destroy_client(struct client_info *ci)
> @@ -957,8 +970,11 @@ static void client_handler(int fd, int events, void *data)
>  		return clear_client_info(ci);
>  
>  	if (events & EPOLLIN) {
> -		if (conn_rx_off(&ci->conn) != 0)
> +		if (conn_rx_off(&ci->conn) != 0) {
> +			sd_err("switch off receiving flag failure, "
> +					"connection maybe closed");
>  			return;
> +		}
>  
>  		/*
>  		 * Increment refcnt so that the client_info isn't freed while
> @@ -971,8 +987,11 @@ static void client_handler(int fd, int events, void *data)
>  	}
>  
>  	if (events & EPOLLOUT) {
> -		if (conn_tx_off(&ci->conn) != 0)
> +		if (conn_tx_off(&ci->conn) != 0) {
> +			sd_err("switch off sending flag failure, "
> +					"connection maybe closed");
>  			return;
> +		}
>  
>  		assert(ci->tx_req == NULL);
>  		ci->tx_req = list_first_entry(&ci->done_reqs, struct request,
> -- 
> 1.8.3.2
> 
> 
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list