[sheepdog] [PATCH v2 4/4] cluster driver: handle pending block/notify event during reconnect
Liu Yuan
namei.unix at gmail.com
Mon Jul 8 10:42:56 CEST 2013
On Sun, Jul 07, 2013 at 09:20:51PM -0700, Kai Zhang wrote:
> Current implementation of reconnection doesn't handle pending block/notify
> event.
>
> It is easy to handle notify event by sending it again.
>
> However, it is a little bit complex for block event.
> This is because a block event need 4 steps.
> 1. in queue_cluster_request(), send block event by sys->cdrv->block(), and
> add to pending_block_list.
> 2. in sd_block_handler(), queue the event to work queue of 'block' thread.
> 3. in cluster_op_done(), send unblock event by sys->cdrv->unblock().
> 4. in sd_notify_handler(), remove it from pending_block_list.
>
> And step 1 and 3 contains broadcast operations.
> So we have to know which step has been done for a pending block event.
>
> If step 1 has been done, we can re-queue it simply. (Any block event which sent
> by this node have been removed due to the leave event)
> If step 2 has been done, the event is handling by another thread. We have to mark
> it as 'drop' so that it will be dropped when cluster_op_done() is called later.
> if step 3 has been done, we should call sd_notify_handler() manually to finish
> it.
>
> Signed-off-by: Kai Zhang <kyle at zelin.io>
> ---
> sheep/group.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> sheep/sheep_priv.h | 8 ++++++
> 2 files changed, 75 insertions(+), 2 deletions(-)
>
> diff --git a/sheep/group.c b/sheep/group.c
> index 2fa4091..1a549de 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -251,6 +251,9 @@ static void cluster_op_done(struct work *work)
> struct vdi_op_message *msg;
> size_t size;
>
> + if (req->status == REQUEST_DROPPED)
> + goto drop;
> +
> sd_dprintf("%s (%p)", op_name(req->op), req);
>
> msg = prepare_cluster_msg(req, &size);
> @@ -266,6 +269,13 @@ static void cluster_op_done(struct work *work)
> }
>
> free(msg);
> + req->status = REQUEST_DONE;
> + return;
> +drop:
> + list_del(&req->pending_list);
> + req->rp.result = SD_RES_CLUSTER_ERROR;
> + put_request(req);
> + cluster_op_running = false;
> }
>
> /*
> @@ -295,6 +305,7 @@ bool sd_block_handler(const struct sd_node *sender)
> req->work.done = cluster_op_done;
>
> queue_work(sys->block_wqueue, &req->work);
> + req->status = REQUEST_QUEUED;
> return true;
> }
>
> @@ -329,7 +340,7 @@ void queue_cluster_request(struct request *req)
>
> free(msg);
> }
> -
> + req->status = REQUEST_INIT;
> return;
> error:
> req->rp.result = SD_RES_CLUSTER_ERROR;
> @@ -927,6 +938,60 @@ static int send_join_request(struct sd_node *ent)
> return ret;
> }
>
> +static void requeue_cluster_request(void)
> +{
> + struct request *req, *p;
> + struct vdi_op_message *msg;
> + size_t size;
> +
> + list_for_each_entry_safe(req, p, main_thread_get(pending_notify_list),
> + pending_list) {
> + sd_dprintf("found an pending notify request, op: %s",
> + op_name(req->op));
> + /*
> + * notify has been sent but sd_notify_handler is never called,
> + * re-queue it
> + */
> + list_del(&req->pending_list);
> + queue_cluster_request(req);
> + }
> +
> + list_for_each_entry_safe(req, p, main_thread_get(pending_block_list),
> + pending_list) {
> + switch (req->status) {
> + case REQUEST_INIT:
> + /* this request has never been executed, re-queue it */
> + sd_dprintf("requeue a block request, op: %s",
> + op_name(req->op));
> + list_del(&req->pending_list);
> + queue_cluster_request(req);
> + break;
> + case REQUEST_QUEUED:
> + /*
> + * This request is handling by the 'block' thread.
s/handling/being handled/
I'd suggest putting it more explicitly:
---
This request is being handled by the 'block' thread and ->unblock() isn't called
yet. We can't call ->unblock thereafter because other sheep has unblocked
themselves due to cluster driver session timeout. Mark it as dropped to stop
cluster_op_done() from calling ->unblock.
---
> + * Don't send unblock event, because other sheep has
> + * unblocked it. Drop it when cluster_op_done() is
> + * called.
> + */
> + sd_dprintf("drop pending block request, op: %s",
> + op_name(req->op));
> + req->status = REQUEST_DROPPED;
> + break;
> + case REQUEST_DONE:
> + /*
> + * Unblock has been sent but sd_notify_handler is never
> + * called, call sd_notify_handler to finish it.
> + */
Would better rephrase as:
---
->unblock() was called and succeeded but after that this node session-timeouted
and sd_notify_handler wasn't called from unblock event handler in cluster driver.
We manually call sd_notify_handler to finish the request.
----
Other parts look good to me
Thanks
Yuan
More information about the sheepdog
mailing list