[sheepdog] [PATCH] sheep: add epoch recovery to reduce GET_EPOCH error

Ruoyu liangry at ucweb.com
Mon Jul 28 07:48:41 CEST 2014


On 2014年07月28日 12:20, Hitoshi Mitake wrote:
> At Mon, 28 Jul 2014 10:41:46 +0800,
> Ruoyu wrote:
>>
>> On 2014年07月27日 16:59, Hitoshi Mitake wrote:
>>> On Tue, Jul 8, 2014 at 12:04 PM, Ruoyu <liangry at ucweb.com> wrote:
>>>> If cluster members change frequently, especially the first two or
>>>> more nodes of the cluster leave and join frequently, the following
>>>> error message will be found in sheep.log
>>>>
>>>> Failed to find requested tag, remote address: 127.0.0.1:7001,
>>>> op name: GET_EPOCH
>>>>
>>>> This patch adds epoch recovery to reduce these errors. To achieve
>>>> it, it is better to unify epoch file because rb_node is unused
>>>> in it. And then, we can create a new epoch file as long as the file
>>>> is not found in local node.
>>> The essential problem of the bloated epoch file is the design of
>>> struct sd_node. The data structure is sent/receive via networks and
>>> stored on disks, but it contains struct rb_node which contains
>>> pointers. Of course the pointers are completely meaningless in
>>> different processes. I prefer redesigning the struct sd_node than
>>> ad-hoc solution (the changes for sheep/store.c). How do you think?
>>>
>>> The changes for sheep/group.c looks good to me. If you can extract
>>> this part and send v2, I'll apply it soon.
>> Redesigning struct sd_node will affect many modules, therefore we need 
>> to think over it's risk and workload. Moreover, it's size is only 
>> hundreds of bytes currently and I think it is acceptable even if the 
>> sheepdog cluster has hundreds of nodes.
>>
>> On the other hand, the main purpose of this patch is to add epoch file 
>> recovery when it is missing which group.c will do. If you accept it, why 
>> don't you expect the content of the epoch file on the same epoch is 
>> exactly same?
> So do you mean that the zero-filling rb_node from epoch file is for
> ensuring epoch files on the same epoch should be exactlly same? If it
> is the purpose of the change for sheep/store.c, I agree with your
> opinion. Could you rebase it on the latest master?
Yes, exactly :)
>
> Thanks,
> Hitoshi
>
>>> Thanks,
>>> Hitoshi
>>>
>>>> Signed-off-by: Ruoyu <liangry at ucweb.com>
>>>> ---
>>>>   sheep/group.c |  7 +++++--
>>>>   sheep/store.c | 11 ++++++++++-
>>>>   2 files changed, 15 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>> index 27e3574..6e4f8ad 100644
>>>> --- a/sheep/group.c
>>>> +++ b/sheep/group.c
>>>> @@ -359,7 +359,7 @@ int epoch_log_read_remote(uint32_t epoch, struct sd_node *nodes, int len,
>>>>          rb_for_each_entry(node, &vinfo->nroot, rb) {
>>>>                  struct sd_req hdr;
>>>>                  struct sd_rsp *rsp = (struct sd_rsp *)&hdr;
>>>> -               int nodes_len;
>>>> +               int nodes_len, nr_nodes;
>>>>
>>>>                  if (node_is_local(node))
>>>>                          continue;
>>>> @@ -382,7 +382,10 @@ int epoch_log_read_remote(uint32_t epoch, struct sd_node *nodes, int len,
>>>>                          memcpy(timestamp, buf + nodes_len, sizeof(*timestamp));
>>>>
>>>>                  free(buf);
>>>> -               return nodes_len / sizeof(struct sd_node);
>>>> +               nr_nodes = nodes_len / sizeof(struct sd_node);
>>>> +               /* epoch file is missing in local node, try to create one */
>>>> +               update_epoch_log(epoch, nodes, nr_nodes);
>>>> +               return nr_nodes;
>>>>          }
>>>>
>>>>          /*
>>>> diff --git a/sheep/store.c b/sheep/store.c
>>>> index 70fddb8..87913d4 100644
>>>> --- a/sheep/store.c
>>>> +++ b/sheep/store.c
>>>> @@ -19,7 +19,7 @@ LIST_HEAD(store_drivers);
>>>>
>>>>   int update_epoch_log(uint32_t epoch, struct sd_node *nodes, size_t nr_nodes)
>>>>   {
>>>> -       int ret, len, nodes_len;
>>>> +       int ret, len, nodes_len, i;
>>>>          time_t t;
>>>>          char path[PATH_MAX], *buf;
>>>>
>>>> @@ -33,6 +33,15 @@ int update_epoch_log(uint32_t epoch, struct sd_node *nodes, size_t nr_nodes)
>>>>          memcpy(buf, nodes, nodes_len);
>>>>          memcpy(buf + nodes_len, &t, sizeof(time_t));
>>>>
>>>> +       /*
>>>> +        * rb_node is unused in epoch file,
>>>> +        * unifying it is good for epoch recovery
>>>> +        */
>>>> +       for (i = 0; i < nr_nodes; i++)
>>>> +               memset(buf + i * sizeof(struct sd_node)
>>>> +                               + offsetof(struct sd_node, rb),
>>>> +                               '\0', sizeof(struct rb_node));
>>>> +
>>>>          snprintf(path, sizeof(path), "%s%08u", epoch_path, epoch);
>>>>
>>>>          ret = atomic_create_and_write(path, buf, len, true);
>>>> --
>>>> 1.8.3.2
>>>>
>>>>
>>>> --
>>>> sheepdog mailing list
>>>> sheepdog at lists.wpkg.org
>>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>





More information about the sheepdog mailing list