[Sheepdog] [PATCH v2 1/3] sheep: introduce SD_STATUS_HALT

Liu Yuan namei.unix at gmail.com
Thu Oct 13 04:15:09 CEST 2011


On 10/12/2011 07:28 PM, MORITA Kazutaka wrote:
> At Tue, 11 Oct 2011 17:27:11 +0800,
> Liu Yuan wrote:
>> From: Liu Yuan<tailai.ly at taobao.com>
>>
>> Currently, sheepdog will serve IO requests even if number of nodes is less than 'copies'.
>>
>> When the number of the nodes (or zones) is less than the copies specified by
>> collie-cluster-format command, the sheepdog cluster should stop serving IO requests.
>>
>> This is necessary to solve the below subtle case:
>>
>> + good nodes, - failed nodes.
>>
>> 0       1      2     3
>> +       -      -     +
>> +  -->   - -->   - -->  +
>> +       +      -     #<-- permanently down.
>>          ^
>>          |
>> this node has the latest data
>>
>> at stage 3, we will have a cluster recovered without the data tracked at stage 1.
>>
>> When the nodes are in the SD_STATUS_HALT, the sheepdog can also serve configuration change
>> and do the recovery job.
>>
>> Signed-off-by: Liu Yuan<tailai.ly at taobao.com>
>> ---
>>   include/sheep.h          |    1 +
>>   include/sheepdog_proto.h |    1 +
>>   sheep/group.c            |   27 ++++++++++++++++++++++-----
>>   sheep/sheep_priv.h       |    1 +
>>   4 files changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/sheep.h b/include/sheep.h
>> index 31516d9..943cdf7 100644
>> --- a/include/sheep.h
>> +++ b/include/sheep.h
>> @@ -254,6 +254,7 @@ static inline const char *sd_strerror(int err)
>>   		{SD_RES_WAIT_FOR_FORMAT, "Waiting for a format operation"},
>>   		{SD_RES_WAIT_FOR_JOIN, "Waiting for other nodes joining"},
>>   		{SD_RES_JOIN_FAILED, "The node had failed to join sheepdog"},
>> +		{SD_RES_HALT, "The node is stopped doing IO, short of living nodes"},
>>
>>   		{SD_RES_OLD_NODE_VER, "Remote node has an old epoch"},
>>   		{SD_RES_NEW_NODE_VER, "Remote node has a new epoch"},
>> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
>> index 2b042f4..a5a41d0 100644
>> --- a/include/sheepdog_proto.h
>> +++ b/include/sheepdog_proto.h
>> @@ -58,6 +58,7 @@
>>   #define SD_RES_WAIT_FOR_FORMAT  0x16 /* Sheepdog is waiting for a format operation */
>>   #define SD_RES_WAIT_FOR_JOIN    0x17 /* Sheepdog is waiting for other nodes joining */
>>   #define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog */
>> +#define SD_RES_HALT 0x19 /* Target node is stopped doing IO */
>>
>>   /*
>>    * Object ID rules
>> diff --git a/sheep/group.c b/sheep/group.c
>> index f6743f5..59293b2 100644
>> --- a/sheep/group.c
>> +++ b/sheep/group.c
>> @@ -335,6 +335,9 @@ void cluster_queue_request(struct work *work, int idx)
>>   		case SD_STATUS_JOIN_FAILED:
>>   			ret = SD_RES_JOIN_FAILED;
>>   			break;
>> +		case SD_STATUS_HALT:
>> +			ret = SD_RES_HALT;
>> +			break;
>>   		default:
>>   			ret = SD_RES_SYSTEM_ERROR;
>>   			break;
>> @@ -639,6 +642,10 @@ static int get_cluster_status(struct sheepdog_node_list_entry *from,
>>   		break;
>>   	case SD_STATUS_SHUTDOWN:
>>   		return SD_RES_SHUTDOWN;
>> +	case SD_STATUS_HALT:
>> +		if (inc_epoch);
>> +			*inc_epoch = 1;
>> +		break;
> We should check epoch and ctime of the joining node.  Otherwise,
> invalid nodes can join the cluster.
>
>>   	default:
>>   		break;
>>   	}
>> @@ -810,12 +817,13 @@ static void update_cluster_info(struct join_message *msg)
>>   				sheepid_to_str(&msg->nodes[i].sheepid));
>>   	}
>>
>> -	if (msg->cluster_status != SD_STATUS_OK)
>> +	if (msg->cluster_status == SD_STATUS_WAIT_FOR_JOIN)
>>   		add_node_to_leave_list((struct message_header *)msg);
>>
>>   	sys->join_finished = 1;
>>
>> -	if (msg->cluster_status == SD_STATUS_OK&&  msg->inc_epoch)
>> +	if ((msg->cluster_status == SD_STATUS_OK || msg->cluster_status == SD_STATUS_HALT)
>> +	&&  msg->inc_epoch)
>>   		update_epoch_log(sys->epoch);
>>
>>   join_finished:
>> @@ -840,6 +848,12 @@ join_finished:
>>   		}
>>   	}
>>
>> +	if (msg->cluster_status == SD_STATUS_HALT&&  msg->inc_epoch) {
>> +		sys->epoch++;
>> +		update_epoch_log(sys->epoch);
>> +		update_epoch_store(sys->epoch);
>> +	}
>> +
> We need to call set_global_nr_copies() and set_cluster_ctime() here
> for newly added nodes.
>
> Other than above, we must replace "sys->status == SD_STATUS_OK"
> with "sys->status == SD_STATUS_OK || sys->status == SD_STATUS_HALT"
> in del_node() and __sd_notify_done(), I think.
>
>
> Thanks,
>
> Kazutaka
Thanks for the review. I'll cook a patch to address your comments.



More information about the sheepdog mailing list