[sheepdog-users] [master branch] SIGABRT when doing: dog vdi check

Wed Jan 8 14:49:29 CET 2014

W dniu 08.01.2014 14:27, Liu Yuan pisze:
> Okay, I know what happened, actually you created 2 nodes in the same
> zone. Try following script to create 5 nodes in 5 zone:
> 
>  for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x -z
> $x /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
> 
> Then everything will be fine. (Notice that -z option)

Great! I can confirm it works!
So (quoting man):"-z specify the zone id (default: determined by listen
address)" doesn't work at this moment?

> On Wed, Jan 8, 2014 at 9:09 PM, Marcin Mirosław <marcin at mejor.pl
> <mailto:marcin at mejor.pl>> wrote:
> 
>     W dniu 08.01.2014 09:53, Liu Yuan pisze:
>     > On Wed, Jan 08, 2014 at 09:47:51AM +0100, Marcin Mirosław wrote:
>     >> W dniu 08.01.2014 07:21, Liu Yuan pisze:
>     >>> On Tue, Jan 07, 2014 at 03:40:44PM +0100, Marcin Mirosław wrote:
>     >>>> W dniu 07.01.2014 14:38, Liu Yuan pisze:
>     >>>>> On Tue, Jan 07, 2014 at 01:29:40PM +0100, Marcin Mirosław wrote:
>     >>>>>> W dniu 07.01.2014 12:50, Liu Yuan pisze:
>     >>>>>>> On Tue, Jan 07, 2014 at 11:14:09AM +0100, Marcin Mirosław wrote:
>     >>>>>>>> W dniu 07.01.2014 11:05, Liu Yuan pisze:
>     >>>>>>>>> On Tue, Jan 07, 2014 at 10:51:18AM +0100, Marcin Mirosław
>     wrote:
>     >>>>>>>>>> W dniu 07.01.2014 03:00, Liu Yuan pisze:
>     >>>>>>>>>>> On Mon, Jan 06, 2014 at 05:38:41PM +0100, Marcin
>     Mirosław wrote:
>     >>>>>>>>>>>> W dniu 2014-01-06 08:27, Liu Yuan pisze:
>     >>>>>>>>>>>>> On Sat, Jan 04, 2014 at 04:13:27PM +0100, Marcin
>     Mirosław wrote:
>     >>>>>>>>>>>>>> W dniu 2014-01-04 06:28, Liu Yuan pisze:
>     >>>>>>>>>>>>>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin
>     Mirosław wrote:
>     >>>>>>>>>>>>>>>> Hi!
>     >>>>>>>>>>>>>>
>     >>>>>>>>>>>>>> Hi all!
>     >>>>>>>>>>>>>>
>     >>>>>>>>>>>>>>>> I'm new on "sheep-run";) I'm starting to try
>     sheepdog so probably
>     >>>>>>>>>>>>>>>> I'm doing many things wrongly. I'm playing with
>     sheepdog-0.7.6.
>     >>>>>>>>>>>>>>>>
>     >>>>>>>>>>>>>>>> First problem (SIGABRT): I started multi sheep
>     daemeon on
>     >>>>>>>>>>>>>>>> localhost: # for x in 0 1 2 3 4; do sheep -c local
>     -j size=128M
>     >>>>>>>>>>>>>>>> -p 700$x
>     /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
>     >>>>>>>>>>>>>>>>
>     >>>>>>>>>>>>>>>> Next: # dog cluster info Cluster status: Waiting
>     for cluster to
>     >>>>>>>>>>>>>>>> be formatted
>     >>>>>>>>>>>>>>>>
>     >>>>>>>>>>>>>>>> # dog cluster format -c 2:1
>     >>>>>>>>>>>>>>>
>     >>>>>>>>>>>>>>> 0.7.6 doesn't support erasure code. Try latest
>     master branch
>     >>>>>>>>>>>>>>
>     >>>>>>>>>>>>>> Now I'm on 486ace8ccbb [master]. How I should check
>     choosen redundancy?
>     >>>>>>>>>>>>>>  # cat /mnt/test/vdi/list
>     >>>>>>>>>>>>>>    Name        Id    Size    Used  Shared    Creation
>     time   VDI id
>     >>>>>>>>>>>>>> Copies  Tag
>     >>>>>>>>>>>>>>    testowy      0  1.0 GB  0.0 MB  0.0 MB 2014-01-04
>     15:07   cac836     3
>     >>>>>>>>>>>>>>
>     >>>>>>>>>>>>>> Here I can see 3 copies, can't see info about how
>     many parity strips
>     >>>>>>>>>>>>>> is configured. Probably this isn't implemented yet?
>     >>>>>>>>>>>>>
>     >>>>>>>>>>>>> Not yet. But currently you can 'dog cluster info -s'
>     to see the global policy
>     >>>>>>>>>>>>> scheme x:y (that you 'dog cluster format -c x:y').
>     >>>>>>>>>>>>>
>     >>>>>>>>>>>>> With erasure coding, 'copies' will have another
>     meaning that the number of total
>     >>>>>>>>>>>>> data + parity objects. In your case, it is 2+1=3. But
>     as you said, this is
>     >>>>>>>>>>>>> confusing, I think of adding a extra field to indicate
>     redundancy scheme per vid.
>     >>>>>>>>>>>>>
>     >>>>>>>>>>>>> Well, for about issue, I can't reproduce it. Could you
>     give me more envronment
>     >>>>>>>>>>>>> information such as 32 or 64 bits of your OS? what is
>     your distro?
>     >>>>>>>>>>>>
>     >>>>>>>>>>>> Hi!
>     >>>>>>>>>>>> I'm using Gentoo 64bits, gcc version 4.7.3 (Gentoo
>     Hardened 4.7.3-r1
>     >>>>>>>>>>>> p1.4, pie-0.5.5), kernel 3.10 with Gentoo patches.
>     >>>>>>>>>>>>
>     >>>>>>>>>>>
>     >>>>>>>>>>> Does the problem still exist? I can't reproduce the
>     issue yet. So how did you
>     >>>>>>>>>>> reproduce it step by step?
>     >>>>>>>>>>
>     >>>>>>>>>> Hi!
>     >>>>>>>>>> I'm installing sheepdog-0.7.x, next:
>     >>>>>>>>>> # mkdir -p /mnt/sheep/{metadata,storage}
>     >>>>>>>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
>     >>>>>>>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
>     >>>>>>>>>> # dog cluster format -c 2
>     >>>>>>>>>> using backend plain store
>     >>>>>>>>>> # dog vdi create testowy 5G
>     >>>>>>>>>> # dog  vdi check testowy
>     >>>>>>>>>> PANIC: can't find next new idx
>     >>>>>>>>>> dog exits unexpectedly (Aborted).
>     >>>>>>>>>> dog() [0x4058da]
>     >>>>>>>>>> [...]
>     >>>>>>>>>>
>     >>>>>>>>>> I'm getting SIGABRT on every try.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> On the same machine, with master branch(not stable-0.7),
>     you mentioned you can't
>     >>>>>>>>> reproduce the problem?
>     >>>>>>>>
>     >>>>>>>> With master branch (commit  a79e69f9ad9c5) I'm getting such
>     message:
>     >>>>>>>> # dog  vdi check testowy
>     >>>>>>>> PANIC: can't find a valid vnode
>     >>>>>>>> dog exits unexpectedly (Aborted).
>     >>>>>>>> dog() [0x4057fa]
>     >>>>>>>> /lib64/libpthread.so.0(+0xfd8f) [0x7f6d43cd0d8f]
>     >>>>>>>> /lib64/libc.so.6(gsignal+0x38) [0x7f6d43951368]
>     >>>>>>>> /lib64/libc.so.6(abort+0x147) [0x7f6d439526c7]
>     >>>>>>>> dog() [0x40336e]
>     >>>>>>>> dog() [0x409d9f]
>     >>>>>>>> dog() [0x40cea5]
>     >>>>>>>> dog() [0x403927]
>     >>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f6d4393dc04]
>     >>>>>>>> dog() [0x403c6c]
>     >>>>>>>>
>     >>>>>>>> Will be full gdb backtrace usefull?
>     >>>>>>>
>     >>>>>>> Hmm, before you run 'dog vdi check', what is output of 'dog
>     cluster info',
>     >>>>>>> 'dog node list', 'dog node md info --all'?
>     >>>>>>
>     >>>>>> Output using master branch:
>     >>>>>> # dog cluster info
>     >>>>>> Cluster status: running, auto-recovery enabled
>     >>>>>>
>     >>>>>> Cluster created at Tue Jan  7 13:21:53 2014
>     >>>>>>
>     >>>>>> Epoch Time           Version
>     >>>>>> 2014-01-07 13:21:54      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >>>>>> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >>>>>>
>     >>>>>> # dog node list
>     >>>>>>   Id   Host:Port         V-Nodes       Zone
>     >>>>>>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>>>>>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>>>>>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>>>>>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>>>>>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>>>>>
>     >>>>>> # dog node md info --all
>     >>>>>> Id      Size    Used    Avail   Use%    Path
>     >>>>>> Node 0:
>     >>>>>>  0      4.4 GB  4.0 MB  4.4 GB    0%    /mnt/sheep/storage/0
>     >>>>>> Node 1:
>     >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/1
>     >>>>>> Node 2:
>     >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/2
>     >>>>>> Node 3:
>     >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/3
>     >>>>>> Node 4:
>     >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/4
>     >>>>>>
>     >>>>>
>     >>>>> The very strange thing from your output is that only 1 copy
>     was actually
>     >>>>> written while you execute 'dog vdi create', but you formated
>     the cluster with
>     >>>>> two copy specified.
>     >>>>>
>     >>>>> You can verify this by
>     >>>>>
>     >>>>> ls /mnt/sheepdog/storage/*/
>     >>>>>
>     >>>>> I guess you can only see one object. Dunno why this happened.
>     >>>>
>     >>>> It is as you said:
>     >>>> # ls /mnt/sheep/storage/*/
>     >>>> /mnt/sheep/storage/0/:
>     >>>> 80cac83600000000
>     >>>>
>     >>>> /mnt/sheep/storage/1/:
>     >>>>
>     >>>> /mnt/sheep/storage/2/:
>     >>>>
>     >>>> /mnt/sheep/storage/3/:
>     >>>>
>     >>>> /mnt/sheep/storage/4/:
>     >>>>
>     >>>>
>     >>>> Now I'm on commit a79e69f9ad9c and problem still exists for me (in
>     >>>> contrary to 0.7-stable). I noticed that in my /tmp appeared file
>     >>>> "sheepdog_shm" and "lock" . Is it correct?
>     >>>>
>     >
>     > lock isn't created by sheep daemon as far as I know. we create
>     sheepdog_locks for
>     > local driver.
>     >
>     >>>
>     >>> I suspect there is only actually one node in the cluster so 'vdi
>     check' panic out.
>     >>>
>     >>> before you run 'vdi check'
>     >>>
>     >>> for i in `seq 0 5`;do dog cluster info -p 700$i;done
>     >>>
>     >>> is every node output same?
>     >>>
>     >>>
>     >>> for i in `seq 0 5`;do dog node list -p 700$i;done
>     >>>
>     >>> same too?
>     >>
>     >> Hi!
>     >> Output is looks as below:
>     >>
>     >> # for i in `seq 0 4`;do dog cluster info -p 700$i;done
>     >> Cluster status: running, auto-recovery enabled
>     >>
>     >> Cluster created at Wed Jan  8 09:42:40 2014
>     >>
>     >> Epoch Time           Version
>     >> 2014-01-08 09:42:41      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >> Cluster status: running, auto-recovery enabled
>     >>
>     >> Cluster created at Wed Jan  8 09:42:40 2014
>     >>
>     >> Epoch Time           Version
>     >> 2014-01-08 09:42:40      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >> Cluster status: running, auto-recovery enabled
>     >>
>     >> Cluster created at Wed Jan  8 09:42:40 2014
>     >>
>     >> Epoch Time           Version
>     >> 2014-01-08 09:42:41      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >> Cluster status: running, auto-recovery enabled
>     >>
>     >> Cluster created at Wed Jan  8 09:42:40 2014
>     >>
>     >> Epoch Time           Version
>     >> 2014-01-08 09:42:40      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >> Cluster status: running, auto-recovery enabled
>     >>
>     >> Cluster created at Wed Jan  8 09:42:40 2014
>     >>
>     >> Epoch Time           Version
>     >> 2014-01-08 09:42:40      1 [127.0.0.1:7000
>     <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
>     >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
>     <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
>     >>
>     >> # for i in `seq 0 4`;do dog node list -p 700$i;done
>     >>   Id   Host:Port         V-Nodes       Zone
>     >>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>   Id   Host:Port         V-Nodes       Zone
>     >>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>   Id   Host:Port         V-Nodes       Zone
>     >>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>   Id   Host:Port         V-Nodes       Zone
>     >>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>   Id   Host:Port         V-Nodes       Zone
>     >>    0   127.0.0.1:7000 <http://127.0.0.1:7000>           128  
>     16777343
>     >>    1   127.0.0.1:7001 <http://127.0.0.1:7001>           128  
>     16777343
>     >>    2   127.0.0.1:7002 <http://127.0.0.1:7002>           128  
>     16777343
>     >>    3   127.0.0.1:7003 <http://127.0.0.1:7003>           128  
>     16777343
>     >>    4   127.0.0.1:7004 <http://127.0.0.1:7004>           128  
>     16777343
>     >>
>     >>
>     >
>     > Everything looks fine. It is very weird. And with 5 nodes you just
>     write 1 copy
>     > succeed. I have no idea what happened and I can't reproduce the
>     problem on my
>     > local machine.
> 
>     I started only two sheeps and turned on debug log level on nodes. There
>     is something suspect for me in master (port 7000) sheep.log:
>     Jan 08 14:01:58  DEBUG [main] clear_client_info(826) connection seems to
>     be dead
> 
>     I'm attaching logs from both sheeps.
> 
>     Marcin
> 
> 

-- 
xmpp (jabber): marcin  [at]  mejor.pl
www: http://blog.mejor.pl/