[Sheepdog] [PATCH] sheep: modify cached_epoch

huxinwei huxinwei at huawei.com
Mon Mar 19 04:26:08 CET 2012


Hi,

   Similar problems happened to me a while ago, even without cache.
However, I think the problem is "What's the expected behavior of formatting a running cluster?"

    Has this been discussed before ? I'm wondering if you already have an answer for this.

From: sheepdog-bounces at lists.wpkg.org [mailto:sheepdog-bounces at lists.wpkg.org] On Behalf Of HaiTing Yao
Sent: Monday, March 19, 2012 10:44 AM
To: Liu Yuan
Cc: HaiTing Yao; sheepdog at lists.wpkg.org
Subject: Re: [Sheepdog] [PATCH] sheep: modify cached_epoch


On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <namei.unix at gmail.com<mailto:namei.unix at gmail.com>> wrote:
On 03/16/2012 04:43 PM, yaohaiting.wujue at gmail.com<mailto:yaohaiting.wujue at gmail.com> wrote:

> From: HaiTing Yao <wujue.yht at taobao.com<mailto:wujue.yht at taobao.com>>
>
> cached_epoch is a __thread variable. If it greater than 1, format the
> cluster again will lead to permanent I/O error.
>
> Signed-off-by: HaiTing Yao <wujue.yht at taobao.com<mailto:wujue.yht at taobao.com>>
> ---
>  sheep/sdnet.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/sheep/sdnet.c b/sheep/sdnet.c
> index 5db9f29..d693858 100644
> --- a/sheep/sdnet.c
> +++ b/sheep/sdnet.c
> @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t port, int node_idx, uint32_t epoch)
>       if (before(epoch, cached_epoch)) {
>               eprintf("requested epoch is smaller than the previous one: %d < %d\n",
>                       epoch, cached_epoch);
> -             return -1;
> +             /* cluster format again */
> +             if (sys->epoch == 1)
> +                     cached_epoch = 0;
> +             else
> +                     return -1;
>       }
>       if (after(epoch, cached_epoch)) {
>               for (i = 0; i < SD_MAX_NODES; i++) {

Any script that can reproduce this issue?

Thanks,
Yuan

Please try this script, thanks

The error log like this

 Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000
Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the previous one: 1 < 2
Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to 127.0.0.1:7002<http://127.0.0.1:7002>
Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129
Mar 19 10:28:14 client_handler(557) closed connection 11
test-cached.sh

set -x
sudo killall sheep
sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4
echo "test cached epoch" > ~/tmp-cached
sudo sheep -d ~/s1 -z 1
sudo sheep -d ~/s2 -z 2 -p 7002
sudo sheep -d ~/s3 -z 3 -p 7003
sudo sheep -d ~/s4 -z 4 -p 7004
sleep 60
collie cluster format
collie vdi create v1 64M
sleep 30
collie vdi write v1 0 1024 < ~/tmp-cached
ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill
sleep 60
collie vdi write v1 0 1024 < ~/tmp-cached
sleep 6
collie cluster format
collie vdi create v1 64M
sleep 60
collie vdi write v1 0 1024 < ~/tmp-cached
Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120319/95c01a2a/attachment-0003.html>


More information about the sheepdog mailing list