<br>
<div class="gmail_quote">On Fri, Mar 16, 2012 at 6:35 PM, Liu Yuan <span dir="ltr"><<a href="mailto:namei.unix@gmail.com">namei.unix@gmail.com</a>></span> wrote:<br>
<blockquote style="BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PADDING-LEFT:1ex" class="gmail_quote">
<div class="HOEnZb">
<div class="h5">On 03/16/2012 04:43 PM, <a href="mailto:yaohaiting.wujue@gmail.com">yaohaiting.wujue@gmail.com</a> wrote:<br><br>> From: HaiTing Yao <<a href="mailto:wujue.yht@taobao.com">wujue.yht@taobao.com</a>><br>
><br>> cached_epoch is a __thread variable. If it greater than 1, format the<br>> cluster again will lead to permanent I/O error.<br>><br>> Signed-off-by: HaiTing Yao <<a href="mailto:wujue.yht@taobao.com">wujue.yht@taobao.com</a>><br>
> ---<br>> sheep/sdnet.c | 6 +++++-<br>> 1 files changed, 5 insertions(+), 1 deletions(-)<br>><br>> diff --git a/sheep/sdnet.c b/sheep/sdnet.c<br>> index 5db9f29..d693858 100644<br>> --- a/sheep/sdnet.c<br>
> +++ b/sheep/sdnet.c<br>> @@ -832,7 +832,11 @@ int get_sheep_fd(uint8_t *addr, uint16_t port, int node_idx, uint32_t epoch)<br>> if (before(epoch, cached_epoch)) {<br>> eprintf("requested epoch is smaller than the previous one: %d < %d\n",<br>
> epoch, cached_epoch);<br>> - return -1;<br>> + /* cluster format again */<br>> + if (sys->epoch == 1)<br>> + cached_epoch = 0;<br>
> + else<br>> + return -1;<br>> }<br>> if (after(epoch, cached_epoch)) {<br>> for (i = 0; i < SD_MAX_NODES; i++) {<br><br><br></div></div>Any script that can reproduce this issue?</blockquote>
<blockquote style="BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PADDING-LEFT:1ex" class="gmail_quote"><br>Thanks,<br>Yuan<br></blockquote>
<div> </div>
<div>Please try this script, thanks</div>
<div> </div>
<div>The error log like this</div>
<div> </div>
<div> Mar 19 10:28:14 forward_write_obj_req(304) 70912800000000<br>Mar 19 10:28:14 get_sheep_fd(834) requested epoch is smaller than the previous one: 1 < 2<br>Mar 19 10:28:14 forward_write_obj_req(337) failed to connect to <a href="http://127.0.0.1:7002">127.0.0.1:7002</a><br>
Mar 19 10:28:14 do_io_request(785) failed: 1, 70912800000000 , 1, 129<br>Mar 19 10:28:14 client_handler(557) closed connection 11<br></div>
<div>test-cached.sh</div>
<div> </div>
<div>set -x</div>
<div>sudo killall sheep<br>sudo rm -rf ~/s1 ~/s2 ~/s3 ~/s4 </div>
<div>echo "test cached epoch" > ~/tmp-cached<br>sudo sheep -d ~/s1 -z 1 <br>sudo sheep -d ~/s2 -z 2 -p 7002 <br>sudo sheep -d ~/s3 -z 3 -p 7003 <br>sudo sheep -d ~/s4 -z 4 -p 7004 </div>
<div>sleep 60</div>
<div>collie cluster format</div>
<div>collie vdi create v1 64M</div>
<div>sleep 30</div>
<div>collie vdi write v1 0 1024 < ~/tmp-cached </div>
<div>ps -ef | grep "\-z 4" | awk '{print $2}' | xargs sudo kill</div>
<div>sleep 60</div>
<div>collie vdi write v1 0 1024 < ~/tmp-cached </div>
<div>sleep 6</div>
<div>collie cluster format</div>
<div>collie vdi create v1 64M</div>
<div>sleep 60</div>
<div>collie vdi write v1 0 1024 < ~/tmp-cached <br></div>
<div>Best Regards</div></div>