From: levin li <xingke.lwp at taobao.com> v1 -- > v2: rebased to the current master branch ------------------------------------------------ >8 In the recovery path, sheep may get to old epoch at which some nodes have left the cluster, we shouldn't try to recover objects from these nodes, so I add a check function to check whether the target node is a valid node at current epoch. Signed-off-by: levin li <xingke.lwp at taobao.com> --- sheep/recovery.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/sheep/recovery.c b/sheep/recovery.c index f341fc6..9d1f010 100644 --- a/sheep/recovery.c +++ b/sheep/recovery.c @@ -193,6 +193,20 @@ static void rollback_old_cur(struct sd_vnode *old, int *old_nr, int *old_copies, *old_copies = new_old_copies; } +static int check_entry_valid(struct sd_vnode *entry, struct sd_node *nodes, + int nr_nodes) +{ + int i; + + for (i = 0; i < nr_nodes; i++) { + if (!memcmp(entry->addr, nodes[i].addr, sizeof(entry->addr)) && + entry->port == nodes[i].port) + return 0; + } + + return -1; +} + /* * Recover the object from its track in epoch history. That is, * the routine will try to recovery it from the nodes it has stayed, @@ -223,8 +237,10 @@ again: int idx; idx = obj_to_sheep(old, old_nr, oid, i); tgt_entry = old + idx; - ret = recover_object_from_replica(oid, tgt_entry, - epoch, tgt_epoch); + ret = check_entry_valid(tgt_entry, rw->cur_nodes, rw->cur_nr_nodes); + if (!ret) + ret = recover_object_from_replica(oid, tgt_entry, + epoch, tgt_epoch); if (ret == 0) { /* Succeed */ break; -- 1.7.10 |