[Sheepdog] [PATCH v2 1/3] sheep: introduce SD_STATUS_HALT
Liu Yuan
namei.unix at gmail.com
Thu Oct 13 04:15:09 CEST 2011
On 10/12/2011 07:28 PM, MORITA Kazutaka wrote:
> At Tue, 11 Oct 2011 17:27:11 +0800,
> Liu Yuan wrote:
>> From: Liu Yuan<tailai.ly at taobao.com>
>>
>> Currently, sheepdog will serve IO requests even if number of nodes is less than 'copies'.
>>
>> When the number of the nodes (or zones) is less than the copies specified by
>> collie-cluster-format command, the sheepdog cluster should stop serving IO requests.
>>
>> This is necessary to solve the below subtle case:
>>
>> + good nodes, - failed nodes.
>>
>> 0 1 2 3
>> + - - +
>> + --> - --> - --> +
>> + + - #<-- permanently down.
>> ^
>> |
>> this node has the latest data
>>
>> at stage 3, we will have a cluster recovered without the data tracked at stage 1.
>>
>> When the nodes are in the SD_STATUS_HALT, the sheepdog can also serve configuration change
>> and do the recovery job.
>>
>> Signed-off-by: Liu Yuan<tailai.ly at taobao.com>
>> ---
>> include/sheep.h | 1 +
>> include/sheepdog_proto.h | 1 +
>> sheep/group.c | 27 ++++++++++++++++++++++-----
>> sheep/sheep_priv.h | 1 +
>> 4 files changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/sheep.h b/include/sheep.h
>> index 31516d9..943cdf7 100644
>> --- a/include/sheep.h
>> +++ b/include/sheep.h
>> @@ -254,6 +254,7 @@ static inline const char *sd_strerror(int err)
>> {SD_RES_WAIT_FOR_FORMAT, "Waiting for a format operation"},
>> {SD_RES_WAIT_FOR_JOIN, "Waiting for other nodes joining"},
>> {SD_RES_JOIN_FAILED, "The node had failed to join sheepdog"},
>> + {SD_RES_HALT, "The node is stopped doing IO, short of living nodes"},
>>
>> {SD_RES_OLD_NODE_VER, "Remote node has an old epoch"},
>> {SD_RES_NEW_NODE_VER, "Remote node has a new epoch"},
>> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
>> index 2b042f4..a5a41d0 100644
>> --- a/include/sheepdog_proto.h
>> +++ b/include/sheepdog_proto.h
>> @@ -58,6 +58,7 @@
>> #define SD_RES_WAIT_FOR_FORMAT 0x16 /* Sheepdog is waiting for a format operation */
>> #define SD_RES_WAIT_FOR_JOIN 0x17 /* Sheepdog is waiting for other nodes joining */
>> #define SD_RES_JOIN_FAILED 0x18 /* Target node had failed to join sheepdog */
>> +#define SD_RES_HALT 0x19 /* Target node is stopped doing IO */
>>
>> /*
>> * Object ID rules
>> diff --git a/sheep/group.c b/sheep/group.c
>> index f6743f5..59293b2 100644
>> --- a/sheep/group.c
>> +++ b/sheep/group.c
>> @@ -335,6 +335,9 @@ void cluster_queue_request(struct work *work, int idx)
>> case SD_STATUS_JOIN_FAILED:
>> ret = SD_RES_JOIN_FAILED;
>> break;
>> + case SD_STATUS_HALT:
>> + ret = SD_RES_HALT;
>> + break;
>> default:
>> ret = SD_RES_SYSTEM_ERROR;
>> break;
>> @@ -639,6 +642,10 @@ static int get_cluster_status(struct sheepdog_node_list_entry *from,
>> break;
>> case SD_STATUS_SHUTDOWN:
>> return SD_RES_SHUTDOWN;
>> + case SD_STATUS_HALT:
>> + if (inc_epoch);
>> + *inc_epoch = 1;
>> + break;
> We should check epoch and ctime of the joining node. Otherwise,
> invalid nodes can join the cluster.
>
>> default:
>> break;
>> }
>> @@ -810,12 +817,13 @@ static void update_cluster_info(struct join_message *msg)
>> sheepid_to_str(&msg->nodes[i].sheepid));
>> }
>>
>> - if (msg->cluster_status != SD_STATUS_OK)
>> + if (msg->cluster_status == SD_STATUS_WAIT_FOR_JOIN)
>> add_node_to_leave_list((struct message_header *)msg);
>>
>> sys->join_finished = 1;
>>
>> - if (msg->cluster_status == SD_STATUS_OK&& msg->inc_epoch)
>> + if ((msg->cluster_status == SD_STATUS_OK || msg->cluster_status == SD_STATUS_HALT)
>> + && msg->inc_epoch)
>> update_epoch_log(sys->epoch);
>>
>> join_finished:
>> @@ -840,6 +848,12 @@ join_finished:
>> }
>> }
>>
>> + if (msg->cluster_status == SD_STATUS_HALT&& msg->inc_epoch) {
>> + sys->epoch++;
>> + update_epoch_log(sys->epoch);
>> + update_epoch_store(sys->epoch);
>> + }
>> +
> We need to call set_global_nr_copies() and set_cluster_ctime() here
> for newly added nodes.
>
> Other than above, we must replace "sys->status == SD_STATUS_OK"
> with "sys->status == SD_STATUS_OK || sys->status == SD_STATUS_HALT"
> in del_node() and __sd_notify_done(), I think.
>
>
> Thanks,
>
> Kazutaka
Thanks for the review. I'll cook a patch to address your comments.
More information about the sheepdog
mailing list