[sheepdog-users] [master branch] SIGABRT when doing: dog vdi check

Liu Yuan namei.unix at gmail.com
Wed Jan 8 14:27:21 CET 2014


Okay, I know what happened, actually you created 2 nodes in the same zone.
Try following script to create 5 nodes in 5 zone:

 for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x -z $x
/mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done

Then everything will be fine. (Notice that -z option)

Thanks
Yuan


On Wed, Jan 8, 2014 at 9:09 PM, Marcin Mirosław <marcin at mejor.pl> wrote:

> W dniu 08.01.2014 09:53, Liu Yuan pisze:
> > On Wed, Jan 08, 2014 at 09:47:51AM +0100, Marcin Mirosław wrote:
> >> W dniu 08.01.2014 07:21, Liu Yuan pisze:
> >>> On Tue, Jan 07, 2014 at 03:40:44PM +0100, Marcin Mirosław wrote:
> >>>> W dniu 07.01.2014 14:38, Liu Yuan pisze:
> >>>>> On Tue, Jan 07, 2014 at 01:29:40PM +0100, Marcin Mirosław wrote:
> >>>>>> W dniu 07.01.2014 12:50, Liu Yuan pisze:
> >>>>>>> On Tue, Jan 07, 2014 at 11:14:09AM +0100, Marcin Mirosław wrote:
> >>>>>>>> W dniu 07.01.2014 11:05, Liu Yuan pisze:
> >>>>>>>>> On Tue, Jan 07, 2014 at 10:51:18AM +0100, Marcin Mirosław wrote:
> >>>>>>>>>> W dniu 07.01.2014 03:00, Liu Yuan pisze:
> >>>>>>>>>>> On Mon, Jan 06, 2014 at 05:38:41PM +0100, Marcin Mirosław
> wrote:
> >>>>>>>>>>>> W dniu 2014-01-06 08:27, Liu Yuan pisze:
> >>>>>>>>>>>>> On Sat, Jan 04, 2014 at 04:13:27PM +0100, Marcin Mirosław
> wrote:
> >>>>>>>>>>>>>> W dniu 2014-01-04 06:28, Liu Yuan pisze:
> >>>>>>>>>>>>>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin Mirosław
> wrote:
> >>>>>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi all!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm new on "sheep-run";) I'm starting to try sheepdog so
> probably
> >>>>>>>>>>>>>>>> I'm doing many things wrongly. I'm playing with
> sheepdog-0.7.6.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> First problem (SIGABRT): I started multi sheep daemeon on
> >>>>>>>>>>>>>>>> localhost: # for x in 0 1 2 3 4; do sheep -c local -j
> size=128M
> >>>>>>>>>>>>>>>> -p 700$x /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x;
> done
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Next: # dog cluster info Cluster status: Waiting for
> cluster to
> >>>>>>>>>>>>>>>> be formatted
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> # dog cluster format -c 2:1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 0.7.6 doesn't support erasure code. Try latest master
> branch
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now I'm on 486ace8ccbb [master]. How I should check choosen
> redundancy?
> >>>>>>>>>>>>>>  # cat /mnt/test/vdi/list
> >>>>>>>>>>>>>>    Name        Id    Size    Used  Shared    Creation time
>   VDI id
> >>>>>>>>>>>>>> Copies  Tag
> >>>>>>>>>>>>>>    testowy      0  1.0 GB  0.0 MB  0.0 MB 2014-01-04 15:07
>   cac836     3
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here I can see 3 copies, can't see info about how many
> parity strips
> >>>>>>>>>>>>>> is configured. Probably this isn't implemented yet?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Not yet. But currently you can 'dog cluster info -s' to see
> the global policy
> >>>>>>>>>>>>> scheme x:y (that you 'dog cluster format -c x:y').
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With erasure coding, 'copies' will have another meaning that
> the number of total
> >>>>>>>>>>>>> data + parity objects. In your case, it is 2+1=3. But as you
> said, this is
> >>>>>>>>>>>>> confusing, I think of adding a extra field to indicate
> redundancy scheme per vid.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Well, for about issue, I can't reproduce it. Could you give
> me more envronment
> >>>>>>>>>>>>> information such as 32 or 64 bits of your OS? what is your
> distro?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi!
> >>>>>>>>>>>> I'm using Gentoo 64bits, gcc version 4.7.3 (Gentoo Hardened
> 4.7.3-r1
> >>>>>>>>>>>> p1.4, pie-0.5.5), kernel 3.10 with Gentoo patches.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Does the problem still exist? I can't reproduce the issue yet.
> So how did you
> >>>>>>>>>>> reproduce it step by step?
> >>>>>>>>>>
> >>>>>>>>>> Hi!
> >>>>>>>>>> I'm installing sheepdog-0.7.x, next:
> >>>>>>>>>> # mkdir -p /mnt/sheep/{metadata,storage}
> >>>>>>>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
> >>>>>>>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
> >>>>>>>>>> # dog cluster format -c 2
> >>>>>>>>>> using backend plain store
> >>>>>>>>>> # dog vdi create testowy 5G
> >>>>>>>>>> # dog  vdi check testowy
> >>>>>>>>>> PANIC: can't find next new idx
> >>>>>>>>>> dog exits unexpectedly (Aborted).
> >>>>>>>>>> dog() [0x4058da]
> >>>>>>>>>> [...]
> >>>>>>>>>>
> >>>>>>>>>> I'm getting SIGABRT on every try.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On the same machine, with master branch(not stable-0.7), you
> mentioned you can't
> >>>>>>>>> reproduce the problem?
> >>>>>>>>
> >>>>>>>> With master branch (commit  a79e69f9ad9c5) I'm getting such
> message:
> >>>>>>>> # dog  vdi check testowy
> >>>>>>>> PANIC: can't find a valid vnode
> >>>>>>>> dog exits unexpectedly (Aborted).
> >>>>>>>> dog() [0x4057fa]
> >>>>>>>> /lib64/libpthread.so.0(+0xfd8f) [0x7f6d43cd0d8f]
> >>>>>>>> /lib64/libc.so.6(gsignal+0x38) [0x7f6d43951368]
> >>>>>>>> /lib64/libc.so.6(abort+0x147) [0x7f6d439526c7]
> >>>>>>>> dog() [0x40336e]
> >>>>>>>> dog() [0x409d9f]
> >>>>>>>> dog() [0x40cea5]
> >>>>>>>> dog() [0x403927]
> >>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f6d4393dc04]
> >>>>>>>> dog() [0x403c6c]
> >>>>>>>>
> >>>>>>>> Will be full gdb backtrace usefull?
> >>>>>>>
> >>>>>>> Hmm, before you run 'dog vdi check', what is output of 'dog
> cluster info',
> >>>>>>> 'dog node list', 'dog node md info --all'?
> >>>>>>
> >>>>>> Output using master branch:
> >>>>>> # dog cluster info
> >>>>>> Cluster status: running, auto-recovery enabled
> >>>>>>
> >>>>>> Cluster created at Tue Jan  7 13:21:53 2014
> >>>>>>
> >>>>>> Epoch Time           Version
> >>>>>> 2014-01-07 13:21:54      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >>>>>> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >>>>>>
> >>>>>> # dog node list
> >>>>>>   Id   Host:Port         V-Nodes       Zone
> >>>>>>    0   127.0.0.1:7000           128   16777343
> >>>>>>    1   127.0.0.1:7001           128   16777343
> >>>>>>    2   127.0.0.1:7002           128   16777343
> >>>>>>    3   127.0.0.1:7003           128   16777343
> >>>>>>    4   127.0.0.1:7004           128   16777343
> >>>>>>
> >>>>>> # dog node md info --all
> >>>>>> Id      Size    Used    Avail   Use%    Path
> >>>>>> Node 0:
> >>>>>>  0      4.4 GB  4.0 MB  4.4 GB    0%    /mnt/sheep/storage/0
> >>>>>> Node 1:
> >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/1
> >>>>>> Node 2:
> >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/2
> >>>>>> Node 3:
> >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/3
> >>>>>> Node 4:
> >>>>>>  0      4.4 GB  0.0 MB  4.4 GB    0%    /mnt/sheep/storage/4
> >>>>>>
> >>>>>
> >>>>> The very strange thing from your output is that only 1 copy was
> actually
> >>>>> written while you execute 'dog vdi create', but you formated the
> cluster with
> >>>>> two copy specified.
> >>>>>
> >>>>> You can verify this by
> >>>>>
> >>>>> ls /mnt/sheepdog/storage/*/
> >>>>>
> >>>>> I guess you can only see one object. Dunno why this happened.
> >>>>
> >>>> It is as you said:
> >>>> # ls /mnt/sheep/storage/*/
> >>>> /mnt/sheep/storage/0/:
> >>>> 80cac83600000000
> >>>>
> >>>> /mnt/sheep/storage/1/:
> >>>>
> >>>> /mnt/sheep/storage/2/:
> >>>>
> >>>> /mnt/sheep/storage/3/:
> >>>>
> >>>> /mnt/sheep/storage/4/:
> >>>>
> >>>>
> >>>> Now I'm on commit a79e69f9ad9c and problem still exists for me (in
> >>>> contrary to 0.7-stable). I noticed that in my /tmp appeared file
> >>>> "sheepdog_shm" and "lock" . Is it correct?
> >>>>
> >
> > lock isn't created by sheep daemon as far as I know. we create
> sheepdog_locks for
> > local driver.
> >
> >>>
> >>> I suspect there is only actually one node in the cluster so 'vdi
> check' panic out.
> >>>
> >>> before you run 'vdi check'
> >>>
> >>> for i in `seq 0 5`;do dog cluster info -p 700$i;done
> >>>
> >>> is every node output same?
> >>>
> >>>
> >>> for i in `seq 0 5`;do dog node list -p 700$i;done
> >>>
> >>> same too?
> >>
> >> Hi!
> >> Output is looks as below:
> >>
> >> # for i in `seq 0 4`;do dog cluster info -p 700$i;done
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan  8 09:42:40 2014
> >>
> >> Epoch Time           Version
> >> 2014-01-08 09:42:41      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan  8 09:42:40 2014
> >>
> >> Epoch Time           Version
> >> 2014-01-08 09:42:40      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan  8 09:42:40 2014
> >>
> >> Epoch Time           Version
> >> 2014-01-08 09:42:41      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan  8 09:42:40 2014
> >>
> >> Epoch Time           Version
> >> 2014-01-08 09:42:40      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan  8 09:42:40 2014
> >>
> >> Epoch Time           Version
> >> 2014-01-08 09:42:40      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >>
> >> # for i in `seq 0 4`;do dog node list -p 700$i;done
> >>   Id   Host:Port         V-Nodes       Zone
> >>    0   127.0.0.1:7000           128   16777343
> >>    1   127.0.0.1:7001           128   16777343
> >>    2   127.0.0.1:7002           128   16777343
> >>    3   127.0.0.1:7003           128   16777343
> >>    4   127.0.0.1:7004           128   16777343
> >>   Id   Host:Port         V-Nodes       Zone
> >>    0   127.0.0.1:7000           128   16777343
> >>    1   127.0.0.1:7001           128   16777343
> >>    2   127.0.0.1:7002           128   16777343
> >>    3   127.0.0.1:7003           128   16777343
> >>    4   127.0.0.1:7004           128   16777343
> >>   Id   Host:Port         V-Nodes       Zone
> >>    0   127.0.0.1:7000           128   16777343
> >>    1   127.0.0.1:7001           128   16777343
> >>    2   127.0.0.1:7002           128   16777343
> >>    3   127.0.0.1:7003           128   16777343
> >>    4   127.0.0.1:7004           128   16777343
> >>   Id   Host:Port         V-Nodes       Zone
> >>    0   127.0.0.1:7000           128   16777343
> >>    1   127.0.0.1:7001           128   16777343
> >>    2   127.0.0.1:7002           128   16777343
> >>    3   127.0.0.1:7003           128   16777343
> >>    4   127.0.0.1:7004           128   16777343
> >>   Id   Host:Port         V-Nodes       Zone
> >>    0   127.0.0.1:7000           128   16777343
> >>    1   127.0.0.1:7001           128   16777343
> >>    2   127.0.0.1:7002           128   16777343
> >>    3   127.0.0.1:7003           128   16777343
> >>    4   127.0.0.1:7004           128   16777343
> >>
> >>
> >
> > Everything looks fine. It is very weird. And with 5 nodes you just write
> 1 copy
> > succeed. I have no idea what happened and I can't reproduce the problem
> on my
> > local machine.
>
> I started only two sheeps and turned on debug log level on nodes. There
> is something suspect for me in master (port 7000) sheep.log:
> Jan 08 14:01:58  DEBUG [main] clear_client_info(826) connection seems to
> be dead
>
> I'm attaching logs from both sheeps.
>
> Marcin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140108/e722b1dc/attachment-0005.html>


More information about the sheepdog-users mailing list