<div dir="ltr">Okay, I know what happened, actually you created 2 nodes in the same zone. Try following script to create 5 nodes in 5 zone:<div><br></div><div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px"> for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x -z $x </span><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">/mnt/sheep/metadata/$x,/mnt/</span><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">sheep/storage/$x; done</span><br>
</div><div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">Then everything will be fine. (Notice that -z option)</span></div>
<div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">Thanks</span></div><div><span style="color:rgb(80,0,80);font-family:arial,sans-serif;font-size:13px">Yuan</span></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jan 8, 2014 at 9:09 PM, Marcin Mirosław <span dir="ltr"><<a href="mailto:marcin@mejor.pl" target="_blank">marcin@mejor.pl</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">W dniu 08.01.2014 09:53, Liu Yuan pisze:<br>
<div><div class="h5">> On Wed, Jan 08, 2014 at 09:47:51AM +0100, Marcin Mirosław wrote:<br>
>> W dniu 08.01.2014 07:21, Liu Yuan pisze:<br>
>>> On Tue, Jan 07, 2014 at 03:40:44PM +0100, Marcin Mirosław wrote:<br>
>>>> W dniu 07.01.2014 14:38, Liu Yuan pisze:<br>
>>>>> On Tue, Jan 07, 2014 at 01:29:40PM +0100, Marcin Mirosław wrote:<br>
>>>>>> W dniu 07.01.2014 12:50, Liu Yuan pisze:<br>
>>>>>>> On Tue, Jan 07, 2014 at 11:14:09AM +0100, Marcin Mirosław wrote:<br>
>>>>>>>> W dniu 07.01.2014 11:05, Liu Yuan pisze:<br>
>>>>>>>>> On Tue, Jan 07, 2014 at 10:51:18AM +0100, Marcin Mirosław wrote:<br>
>>>>>>>>>> W dniu 07.01.2014 03:00, Liu Yuan pisze:<br>
>>>>>>>>>>> On Mon, Jan 06, 2014 at 05:38:41PM +0100, Marcin Mirosław wrote:<br>
>>>>>>>>>>>> W dniu 2014-01-06 08:27, Liu Yuan pisze:<br>
>>>>>>>>>>>>> On Sat, Jan 04, 2014 at 04:13:27PM +0100, Marcin Mirosław wrote:<br>
>>>>>>>>>>>>>> W dniu 2014-01-04 06:28, Liu Yuan pisze:<br>
>>>>>>>>>>>>>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin Mirosław wrote:<br>
>>>>>>>>>>>>>>>> Hi!<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> Hi all!<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>>> I'm new on "sheep-run";) I'm starting to try sheepdog so probably<br>
>>>>>>>>>>>>>>>> I'm doing many things wrongly. I'm playing with sheepdog-0.7.6.<br>
>>>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>>> First problem (SIGABRT): I started multi sheep daemeon on<br>
>>>>>>>>>>>>>>>> localhost: # for x in 0 1 2 3 4; do sheep -c local -j size=128M<br>
>>>>>>>>>>>>>>>> -p 700$x /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done<br>
>>>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>>> Next: # dog cluster info Cluster status: Waiting for cluster to<br>
>>>>>>>>>>>>>>>> be formatted<br>
>>>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>>> # dog cluster format -c 2:1<br>
>>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>> 0.7.6 doesn't support erasure code. Try latest master branch<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> Now I'm on 486ace8ccbb [master]. How I should check choosen redundancy?<br>
>>>>>>>>>>>>>> # cat /mnt/test/vdi/list<br>
>>>>>>>>>>>>>> Name Id Size Used Shared Creation time VDI id<br>
>>>>>>>>>>>>>> Copies Tag<br>
>>>>>>>>>>>>>> testowy 0 1.0 GB 0.0 MB 0.0 MB 2014-01-04 15:07 cac836 3<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> Here I can see 3 copies, can't see info about how many parity strips<br>
>>>>>>>>>>>>>> is configured. Probably this isn't implemented yet?<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>> Not yet. But currently you can 'dog cluster info -s' to see the global policy<br>
>>>>>>>>>>>>> scheme x:y (that you 'dog cluster format -c x:y').<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>> With erasure coding, 'copies' will have another meaning that the number of total<br>
>>>>>>>>>>>>> data + parity objects. In your case, it is 2+1=3. But as you said, this is<br>
>>>>>>>>>>>>> confusing, I think of adding a extra field to indicate redundancy scheme per vid.<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>> Well, for about issue, I can't reproduce it. Could you give me more envronment<br>
>>>>>>>>>>>>> information such as 32 or 64 bits of your OS? what is your distro?<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> Hi!<br>
>>>>>>>>>>>> I'm using Gentoo 64bits, gcc version 4.7.3 (Gentoo Hardened 4.7.3-r1<br>
>>>>>>>>>>>> p1.4, pie-0.5.5), kernel 3.10 with Gentoo patches.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>><br>
>>>>>>>>>>> Does the problem still exist? I can't reproduce the issue yet. So how did you<br>
>>>>>>>>>>> reproduce it step by step?<br>
>>>>>>>>>><br>
>>>>>>>>>> Hi!<br>
>>>>>>>>>> I'm installing sheepdog-0.7.x, next:<br>
>>>>>>>>>> # mkdir -p /mnt/sheep/{metadata,storage}<br>
>>>>>>>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x<br>
>>>>>>>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done<br>
>>>>>>>>>> # dog cluster format -c 2<br>
>>>>>>>>>> using backend plain store<br>
>>>>>>>>>> # dog vdi create testowy 5G<br>
>>>>>>>>>> # dog vdi check testowy<br>
>>>>>>>>>> PANIC: can't find next new idx<br>
>>>>>>>>>> dog exits unexpectedly (Aborted).<br>
>>>>>>>>>> dog() [0x4058da]<br>
>>>>>>>>>> [...]<br>
>>>>>>>>>><br>
>>>>>>>>>> I'm getting SIGABRT on every try.<br>
>>>>>>>>>><br>
>>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> On the same machine, with master branch(not stable-0.7), you mentioned you can't<br>
>>>>>>>>> reproduce the problem?<br>
>>>>>>>><br>
>>>>>>>> With master branch (commit a79e69f9ad9c5) I'm getting such message:<br>
>>>>>>>> # dog vdi check testowy<br>
>>>>>>>> PANIC: can't find a valid vnode<br>
>>>>>>>> dog exits unexpectedly (Aborted).<br>
>>>>>>>> dog() [0x4057fa]<br>
>>>>>>>> /lib64/libpthread.so.0(+0xfd8f) [0x7f6d43cd0d8f]<br>
>>>>>>>> /lib64/libc.so.6(gsignal+0x38) [0x7f6d43951368]<br>
>>>>>>>> /lib64/libc.so.6(abort+0x147) [0x7f6d439526c7]<br>
>>>>>>>> dog() [0x40336e]<br>
>>>>>>>> dog() [0x409d9f]<br>
>>>>>>>> dog() [0x40cea5]<br>
>>>>>>>> dog() [0x403927]<br>
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f6d4393dc04]<br>
>>>>>>>> dog() [0x403c6c]<br>
>>>>>>>><br>
>>>>>>>> Will be full gdb backtrace usefull?<br>
>>>>>>><br>
>>>>>>> Hmm, before you run 'dog vdi check', what is output of 'dog cluster info',<br>
>>>>>>> 'dog node list', 'dog node md info --all'?<br>
>>>>>><br>
>>>>>> Output using master branch:<br>
>>>>>> # dog cluster info<br>
>>>>>> Cluster status: running, auto-recovery enabled<br>
>>>>>><br>
>>>>>> Cluster created at Tue Jan 7 13:21:53 2014<br>
>>>>>><br>
>>>>>> Epoch Time Version<br>
>>>>>> 2014-01-07 13:21:54 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>>>>>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>>>>>><br>
>>>>>> # dog node list<br>
>>>>>> Id Host:Port V-Nodes Zone<br>
>>>>>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>>>>>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>>>>>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>>>>>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>>>>>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>>>>>><br>
>>>>>> # dog node md info --all<br>
>>>>>> Id Size Used Avail Use% Path<br>
>>>>>> Node 0:<br>
>>>>>> 0 4.4 GB 4.0 MB 4.4 GB 0% /mnt/sheep/storage/0<br>
>>>>>> Node 1:<br>
>>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/1<br>
>>>>>> Node 2:<br>
>>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/2<br>
>>>>>> Node 3:<br>
>>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/3<br>
>>>>>> Node 4:<br>
>>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/4<br>
>>>>>><br>
>>>>><br>
>>>>> The very strange thing from your output is that only 1 copy was actually<br>
>>>>> written while you execute 'dog vdi create', but you formated the cluster with<br>
>>>>> two copy specified.<br>
>>>>><br>
>>>>> You can verify this by<br>
>>>>><br>
>>>>> ls /mnt/sheepdog/storage/*/<br>
>>>>><br>
>>>>> I guess you can only see one object. Dunno why this happened.<br>
>>>><br>
>>>> It is as you said:<br>
>>>> # ls /mnt/sheep/storage/*/<br>
>>>> /mnt/sheep/storage/0/:<br>
>>>> 80cac83600000000<br>
>>>><br>
>>>> /mnt/sheep/storage/1/:<br>
>>>><br>
>>>> /mnt/sheep/storage/2/:<br>
>>>><br>
>>>> /mnt/sheep/storage/3/:<br>
>>>><br>
>>>> /mnt/sheep/storage/4/:<br>
>>>><br>
>>>><br>
>>>> Now I'm on commit a79e69f9ad9c and problem still exists for me (in<br>
>>>> contrary to 0.7-stable). I noticed that in my /tmp appeared file<br>
>>>> "sheepdog_shm" and "lock" . Is it correct?<br>
>>>><br>
><br>
> lock isn't created by sheep daemon as far as I know. we create sheepdog_locks for<br>
> local driver.<br>
><br>
>>><br>
>>> I suspect there is only actually one node in the cluster so 'vdi check' panic out.<br>
>>><br>
>>> before you run 'vdi check'<br>
>>><br>
>>> for i in `seq 0 5`;do dog cluster info -p 700$i;done<br>
>>><br>
>>> is every node output same?<br>
>>><br>
>>><br>
>>> for i in `seq 0 5`;do dog node list -p 700$i;done<br>
>>><br>
>>> same too?<br>
>><br>
>> Hi!<br>
>> Output is looks as below:<br>
>><br>
>> # for i in `seq 0 4`;do dog cluster info -p 700$i;done<br>
>> Cluster status: running, auto-recovery enabled<br>
>><br>
>> Cluster created at Wed Jan 8 09:42:40 2014<br>
>><br>
>> Epoch Time Version<br>
>> 2014-01-08 09:42:41 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>> Cluster status: running, auto-recovery enabled<br>
>><br>
>> Cluster created at Wed Jan 8 09:42:40 2014<br>
>><br>
>> Epoch Time Version<br>
>> 2014-01-08 09:42:40 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>> Cluster status: running, auto-recovery enabled<br>
>><br>
>> Cluster created at Wed Jan 8 09:42:40 2014<br>
>><br>
>> Epoch Time Version<br>
>> 2014-01-08 09:42:41 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>> Cluster status: running, auto-recovery enabled<br>
>><br>
>> Cluster created at Wed Jan 8 09:42:40 2014<br>
>><br>
>> Epoch Time Version<br>
>> 2014-01-08 09:42:40 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>> Cluster status: running, auto-recovery enabled<br>
>><br>
>> Cluster created at Wed Jan 8 09:42:40 2014<br>
>><br>
>> Epoch Time Version<br>
>> 2014-01-08 09:42:40 1 [<a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a>, <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a>,<br>
>> <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a>, <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a>, <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a>]<br>
>><br>
>> # for i in `seq 0 4`;do dog node list -p 700$i;done<br>
>> Id Host:Port V-Nodes Zone<br>
>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>> Id Host:Port V-Nodes Zone<br>
>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>> Id Host:Port V-Nodes Zone<br>
>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>> Id Host:Port V-Nodes Zone<br>
>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>> Id Host:Port V-Nodes Zone<br>
>> 0 <a href="http://127.0.0.1:7000" target="_blank">127.0.0.1:7000</a> 128 16777343<br>
>> 1 <a href="http://127.0.0.1:7001" target="_blank">127.0.0.1:7001</a> 128 16777343<br>
>> 2 <a href="http://127.0.0.1:7002" target="_blank">127.0.0.1:7002</a> 128 16777343<br>
>> 3 <a href="http://127.0.0.1:7003" target="_blank">127.0.0.1:7003</a> 128 16777343<br>
>> 4 <a href="http://127.0.0.1:7004" target="_blank">127.0.0.1:7004</a> 128 16777343<br>
>><br>
>><br>
><br>
> Everything looks fine. It is very weird. And with 5 nodes you just write 1 copy<br>
> succeed. I have no idea what happened and I can't reproduce the problem on my<br>
> local machine.<br>
<br>
</div></div>I started only two sheeps and turned on debug log level on nodes. There<br>
is something suspect for me in master (port 7000) sheep.log:<br>
Jan 08 14:01:58 DEBUG [main] clear_client_info(826) connection seems to<br>
be dead<br>
<br>
I'm attaching logs from both sheeps.<br>
<span class="HOEnZb"><font color="#888888"><br>
Marcin<br>
<br>
</font></span></blockquote></div><br></div>