[Sheepdog] [PATCH] sheep: modify cached_epoch

Liu Yuan namei.unix at gmail.com
Tue Mar 20 03:08:15 CET 2012


On 03/19/2012 10:44 AM, HaiTing Yao wrote:

> 
> On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <namei.unix at gmail.com
> <mailto:namei.unix at gmail.com>> wrote:
> 
>     On 03/16/2012 04:43 PM, yaohaiting.wujue at gmail.com
>     <mailto:yaohaiting.wujue at gmail.com> wrote:
> 
>     > From: HaiTing Yao <wujue.yht at taobao.com <mailto:wujue.yht at taobao.com>>
>     >
>     > cached_epoch is a __thread variable. If it greater than 1, format the
>     > cluster again will lead to permanent I/O error.
>     >
>     > Signed-off-by: HaiTing Yao <wujue.yht at taobao.com
>     <mailto:wujue.yht at taobao.com>>
>     > ---
>     >  sheep/sdnet.c |    6 +++++-
>     >  1 files changed, 5 insertions(+), 1 deletions(-)
>     >
>     > diff --git a/sheep/sdnet.c b/sheep/sdnet.c
>     > index 5db9f29..d693858 100644
>     > --- a/sheep/sdnet.c
>     > +++ b/sheep/sdnet.c
>     > @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t
>     port, int node_idx, uint32_t epoch)
>     >       if (before(epoch, cached_epoch)) {
>     >               eprintf("requested epoch is smaller than the
>     previous one: %d < %d\n",
>     >                       epoch, cached_epoch);
>     > -             return -1;
>     > +             /* cluster format again */
>     > +             if (sys->epoch == 1)
>     > +                     cached_epoch = 0;
>     > +             else
>     > +                     return -1;
>     >       }
>     >       if (after(epoch, cached_epoch)) {
>     >               for (i = 0; i < SD_MAX_NODES; i++) {
> 
> 
>     Any script that can reproduce this issue?
> 
> 
>     Thanks,
>     Yuan
> 
>  
> Please try this script, thanks
>  
> The error log like this
>  
>  Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000
> Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the
> previous one: 1 < 2
> Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to
> 127.0.0.1:7002 <http://127.0.0.1:7002>
> Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129
> Mar 19 10:28:14 client_handler(557) closed connection 11
> test-cached.sh
>  
> set -x
> sudo killall sheep
> sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4
> echo "test cached epoch" > ~/tmp-cached
> sudo sheep -d ~/s1 -z 1
> sudo sheep -d ~/s2 -z 2 -p 7002
> sudo sheep -d ~/s3 -z 3 -p 7003
> sudo sheep -d ~/s4 -z 4 -p 7004
> sleep 60
> collie cluster format
> collie vdi create v1 64M
> sleep 30
> collie vdi write v1 0 1024 < ~/tmp-cached
> ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill
> sleep 60
> collie vdi write v1 0 1024 < ~/tmp-cached
> sleep 6
> collie cluster format
> collie vdi create v1 64M
> sleep 60
> collie vdi write v1 0 1024 < ~/tmp-cached
> Best Regards


I can't reproduce the issue, using the following script:

for i in 0 1 2 3 4 5 6; do sheep/sheep -d
/home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done
collie/collie cluster format -b farm
qemu-img create -f raw sheepdog:test 10G
~/qemu-devel/qemu-io -c "write -P 0x1 0 10M" sheepdog:test
for i in 2; do pkill -f "sheep/sheep -d
/home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";done;
~/qemu-devel/qemu-io -c "write -P 0x2 0 10M" sheepdog:test
collie/collie cluster format -b farm
qemu-img create -f raw sheepdog:test 10G

Thanks,
Yuan




More information about the sheepdog mailing list