[Sheepdog] panic in get_nth_node

Liu Yuan namei.unix at gmail.com
Thu Mar 15 13:14:25 CET 2012


On 03/15/2012 11:04 AM, huxinwei wrote:

> Hi Yuan,
>   I did reproduce the issue yesterday. Here's the related log:
> 
> Mar 14 22:33:01 request_obj_list(1702) 30064771072002
> Mar 14 22:33:01 screen_obj_list(1745) 2
> Mar 14 22:33:01 fill_obj_list(1814) 2
> Mar 14 22:33:01 recover_object(1479) done:0 count:2, oid:71d60700000000
> Mar 14 22:33:01 err_to_sderr(118) object 0071d60700000000 not found locally
> Mar 14 22:33:01 do_recover_object(1412) try recover object 71d60700000000 from epoch 36
> Mar 14 22:33:01 find_tgt_node(1195) 34, 64, 1, 67, 128, 2, 1
> Mar 14 22:33:01 recover_object_from_replica(1318) 192.168.136.130, 7000
> Mar 14 22:33:01 recover_object_from_replica(1361) failed, res: 2
> Mar 14 22:33:01 do_recover_object(1412) try recover object 71d60700000000 from epoch 35
> Mar 14 22:33:01 find_tgt_node(1195) 67, 128, 2, 34, 64, 1, 1
> Mar 14 22:33:01 get_nth_node(193) bug
> Mar 14 22:33:01 log_sigexit(361) sheep pid 17851 exiting.
> 
>   I think I figured out what's wrong this time. So here's an new patch obsolete the previous one.
> Would you help to review it ?
> 
> It can be cases that, in some epoch, sheepdog cannot maintain the required copies of replications.
> When recovering from such epoch, we'd better be conservative and double check.
> 
> Signed-off-by: Xinwei Hu <huxinwei at huawei.com>
> ---
>  sheep/store.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/sheep/store.c b/sheep/store.c
> index 9f8a2c6..b875bc5 100644
> --- a/sheep/store.c
> +++ b/sheep/store.c
> @@ -1326,6 +1326,13 @@ again:
> 
>         dprintf("try recover object %"PRIx64" from epoch %"PRIu32"\n", oid, tgt_epoch);
> 
> +       if (cur_copies <= copy_idx) {
> +               eprintf("epoch (%d) has less copies (%d) than requested copy_idx: %d\n",
> +                               tgt_epoch, cur_copies, copy_idx);
> +               ret = -1;
> +               goto err;
> +       }
> +
>         tgt_idx = find_tgt_node(old, old_nr, old_idx, old_copies,
>                         cur, cur_nr, cur_idx, cur_copies, copy_idx);
>         if (tgt_idx < 0) {
> --
> 1.7.1


Hi Xinwei,

Would you mind rebasing the patch? I can't 'git am' it to apply it.

Thanks,
Yuan



More information about the sheepdog mailing list