On 10/12/2011 07:28 PM, MORITA Kazutaka wrote: > At Tue, 11 Oct 2011 17:27:11 +0800, > Liu Yuan wrote: >> From: Liu Yuan<tailai.ly at taobao.com> >> >> Currently, sheepdog will serve IO requests even if number of nodes is less than 'copies'. >> >> When the number of the nodes (or zones) is less than the copies specified by >> collie-cluster-format command, the sheepdog cluster should stop serving IO requests. >> >> This is necessary to solve the below subtle case: >> >> + good nodes, - failed nodes. >> >> 0 1 2 3 >> + - - + >> + --> - --> - --> + >> + + - #<-- permanently down. >> ^ >> | >> this node has the latest data >> >> at stage 3, we will have a cluster recovered without the data tracked at stage 1. >> >> When the nodes are in the SD_STATUS_HALT, the sheepdog can also serve configuration change >> and do the recovery job. >> >> Signed-off-by: Liu Yuan<tailai.ly at taobao.com> >> --- >> include/sheep.h | 1 + >> include/sheepdog_proto.h | 1 + >> sheep/group.c | 27 ++++++++++++++++++++++----- >> sheep/sheep_priv.h | 1 + >> 4 files changed, 25 insertions(+), 5 deletions(-) >> >> diff --git a/include/sheep.h b/include/sheep.h >> index 31516d9..943cdf7 100644 >> --- a/include/sheep.h >> +++ b/include/sheep.h >> @@ -254,6 +254,7 @@ static inline const char *sd_strerror(int err) >> {SD_RES_WAIT_FOR_FORMAT, "Waiting for a format operation"}, >> {SD_RES_WAIT_FOR_JOIN, "Waiting for other nodes joining"}, >> {SD_RES_JOIN_FAILED, "The node had failed to join sheepdog"}, >> + {SD_RES_HALT, "The node is stopped doing IO, short of living nodes"}, >> >> {SD_RES_OLD_NODE_VER, "Remote node has an old epoch"}, >> {SD_RES_NEW_NODE_VER, "Remote node has a new epoch"}, >> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h >> index 2b042f4..a5a41d0 100644 >> --- a/include/sheepdog_proto.h >> +++ b/include/sheepdog_proto.h >> @@ -58,6 +58,7 @@ >> #define SD_RES_WAIT_FOR_FORMAT 0x16 /* Sheepdog is waiting for a format operation */ >> #define SD_RES_WAIT_FOR_JOIN 0x17 /* Sheepdog is waiting for other nodes joining */ >> #define SD_RES_JOIN_FAILED 0x18 /* Target node had failed to join sheepdog */ >> +#define SD_RES_HALT 0x19 /* Target node is stopped doing IO */ >> >> /* >> * Object ID rules >> diff --git a/sheep/group.c b/sheep/group.c >> index f6743f5..59293b2 100644 >> --- a/sheep/group.c >> +++ b/sheep/group.c >> @@ -335,6 +335,9 @@ void cluster_queue_request(struct work *work, int idx) >> case SD_STATUS_JOIN_FAILED: >> ret = SD_RES_JOIN_FAILED; >> break; >> + case SD_STATUS_HALT: >> + ret = SD_RES_HALT; >> + break; >> default: >> ret = SD_RES_SYSTEM_ERROR; >> break; >> @@ -639,6 +642,10 @@ static int get_cluster_status(struct sheepdog_node_list_entry *from, >> break; >> case SD_STATUS_SHUTDOWN: >> return SD_RES_SHUTDOWN; >> + case SD_STATUS_HALT: >> + if (inc_epoch); >> + *inc_epoch = 1; >> + break; > We should check epoch and ctime of the joining node. Otherwise, > invalid nodes can join the cluster. > >> default: >> break; >> } >> @@ -810,12 +817,13 @@ static void update_cluster_info(struct join_message *msg) >> sheepid_to_str(&msg->nodes[i].sheepid)); >> } >> >> - if (msg->cluster_status != SD_STATUS_OK) >> + if (msg->cluster_status == SD_STATUS_WAIT_FOR_JOIN) >> add_node_to_leave_list((struct message_header *)msg); >> >> sys->join_finished = 1; >> >> - if (msg->cluster_status == SD_STATUS_OK&& msg->inc_epoch) >> + if ((msg->cluster_status == SD_STATUS_OK || msg->cluster_status == SD_STATUS_HALT) >> + && msg->inc_epoch) >> update_epoch_log(sys->epoch); >> >> join_finished: >> @@ -840,6 +848,12 @@ join_finished: >> } >> } >> >> + if (msg->cluster_status == SD_STATUS_HALT&& msg->inc_epoch) { >> + sys->epoch++; >> + update_epoch_log(sys->epoch); >> + update_epoch_store(sys->epoch); >> + } >> + > We need to call set_global_nr_copies() and set_cluster_ctime() here > for newly added nodes. > > Other than above, we must replace "sys->status == SD_STATUS_OK" > with "sys->status == SD_STATUS_OK || sys->status == SD_STATUS_HALT" > in del_node() and __sd_notify_done(), I think. > > > Thanks, > > Kazutaka Thanks for the review. I'll cook a patch to address your comments. |