[Sheepdog] [PATCH] sheep: modify cached_epoch

Mon Mar 19 06:55:08 CET 2012

On Mon, Mar 19, 2012 at 11:26 AM, huxinwei <huxinwei at huawei.com> wrote:

>  Hi,****
>
> ** **
>
>    Similar problems happened to me a while ago, even without cache.****
>
> However, I think the problem is “What’s the expected behavior of
> formatting a running cluster?”****
>
> ** **
>
>     Has this been discussed before ? I’m wondering if you already have an
> answer for this.
>

Yes, the root cause is formatting a running cluster.
>From my test,  the formatting lead to object and vdi bitmap clear. It seems
right to me. I have not tested it carefully. There are some other errors in
your test?

I am not sure of the behavior of this kind of formatting. I think sheepdog
should support this.

Thanks,
Haiting

>  ****
>
> ** **
>
> *From:* sheepdog-bounces at lists.wpkg.org [mailto:
> sheepdog-bounces at lists.wpkg.org] *On Behalf Of *HaiTing Yao
> *Sent:* Monday, March 19, 2012 10:44 AM
> *To:* Liu Yuan
> *Cc:* HaiTing Yao; sheepdog at lists.wpkg.org
> *Subject:* Re: [Sheepdog] [PATCH] sheep: modify cached_epoch****
>
> ** **
>
> ** **
>
> On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <namei.unix at gmail.com> wrote:***
> *
>
> On 03/16/2012 04:43 PM, yaohaiting.wujue at gmail.com wrote:
>
> > From: HaiTing Yao <wujue.yht at taobao.com>
> >
> > cached_epoch is a __thread variable. If it greater than 1, format the
> > cluster again will lead to permanent I/O error.
> >
> > Signed-off-by: HaiTing Yao <wujue.yht at taobao.com>
> > ---
> >  sheep/sdnet.c |    6 +++++-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/sheep/sdnet.c b/sheep/sdnet.c
> > index 5db9f29..d693858 100644
> > --- a/sheep/sdnet.c
> > +++ b/sheep/sdnet.c
> > @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t port, int
> node_idx, uint32_t epoch)
> >       if (before(epoch, cached_epoch)) {
> >               eprintf("requested epoch is smaller than the previous one:
> %d < %d\n",
> >                       epoch, cached_epoch);
> > -             return -1;
> > +             /* cluster format again */
> > +             if (sys->epoch == 1)
> > +                     cached_epoch = 0;
> > +             else
> > +                     return -1;
> >       }
> >       if (after(epoch, cached_epoch)) {
> >               for (i = 0; i < SD_MAX_NODES; i++) {
>
> ****
>
> Any script that can reproduce this issue?****
>
>
> Thanks,
> Yuan****
>
>   ****
>
> Please try this script, thanks****
>
>  ****
>
> The error log like this****
>
>  ****
>
>  Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000
> Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the
> previous one: 1 < 2
> Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to
> 127.0.0.1:7002
> Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129
> Mar 19 10:28:14 client_handler(557) closed connection 11****
>
> test-cached.sh****
>
>  ****
>
> set -x****
>
> sudo killall sheep
> sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4 ****
>
> echo "test cached epoch" > ~/tmp-cached
> sudo sheep -d ~/s1 -z 1
> sudo sheep -d ~/s2 -z 2 -p 7002
> sudo sheep -d ~/s3 -z 3 -p 7003
> sudo sheep -d ~/s4 -z 4 -p 7004 ****
>
> sleep 60****
>
> collie cluster format****
>
> collie vdi create v1 64M****
>
> sleep 30****
>
> collie vdi write v1 0 1024 < ~/tmp-cached ****
>
> ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill****
>
> sleep 60****
>
> collie vdi write v1 0 1024 < ~/tmp-cached ****
>
> sleep 6****
>
> collie cluster format****
>
> collie vdi create v1 64M****
>
> sleep 60****
>
> collie vdi write v1 0 1024 < ~/tmp-cached ****
>
> Best Regards****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120319/075ba40e/attachment-0003.html>