[sheepdog-users] [master branch] SIGABRT when doing: dog vdi check
Marcin Mirosław
marcin at mejor.pl
Wed Jan 8 14:49:29 CET 2014
W dniu 08.01.2014 14:27, Liu Yuan pisze:
> Okay, I know what happened, actually you created 2 nodes in the same
> zone. Try following script to create 5 nodes in 5 zone:
>
> for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x -z
> $x /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
>
> Then everything will be fine. (Notice that -z option)
Great! I can confirm it works!
So (quoting man):"-z specify the zone id (default: determined by listen
address)" doesn't work at this moment?
> On Wed, Jan 8, 2014 at 9:09 PM, Marcin Mirosław <marcin at mejor.pl
> <mailto:marcin at mejor.pl>> wrote:
>
> W dniu 08.01.2014 09:53, Liu Yuan pisze:
> > On Wed, Jan 08, 2014 at 09:47:51AM +0100, Marcin Mirosław wrote:
> >> W dniu 08.01.2014 07:21, Liu Yuan pisze:
> >>> On Tue, Jan 07, 2014 at 03:40:44PM +0100, Marcin Mirosław wrote:
> >>>> W dniu 07.01.2014 14:38, Liu Yuan pisze:
> >>>>> On Tue, Jan 07, 2014 at 01:29:40PM +0100, Marcin Mirosław wrote:
> >>>>>> W dniu 07.01.2014 12:50, Liu Yuan pisze:
> >>>>>>> On Tue, Jan 07, 2014 at 11:14:09AM +0100, Marcin Mirosław wrote:
> >>>>>>>> W dniu 07.01.2014 11:05, Liu Yuan pisze:
> >>>>>>>>> On Tue, Jan 07, 2014 at 10:51:18AM +0100, Marcin Mirosław
> wrote:
> >>>>>>>>>> W dniu 07.01.2014 03:00, Liu Yuan pisze:
> >>>>>>>>>>> On Mon, Jan 06, 2014 at 05:38:41PM +0100, Marcin
> Mirosław wrote:
> >>>>>>>>>>>> W dniu 2014-01-06 08:27, Liu Yuan pisze:
> >>>>>>>>>>>>> On Sat, Jan 04, 2014 at 04:13:27PM +0100, Marcin
> Mirosław wrote:
> >>>>>>>>>>>>>> W dniu 2014-01-04 06:28, Liu Yuan pisze:
> >>>>>>>>>>>>>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin
> Mirosław wrote:
> >>>>>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi all!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm new on "sheep-run";) I'm starting to try
> sheepdog so probably
> >>>>>>>>>>>>>>>> I'm doing many things wrongly. I'm playing with
> sheepdog-0.7.6.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> First problem (SIGABRT): I started multi sheep
> daemeon on
> >>>>>>>>>>>>>>>> localhost: # for x in 0 1 2 3 4; do sheep -c local
> -j size=128M
> >>>>>>>>>>>>>>>> -p 700$x
> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Next: # dog cluster info Cluster status: Waiting
> for cluster to
> >>>>>>>>>>>>>>>> be formatted
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> # dog cluster format -c 2:1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 0.7.6 doesn't support erasure code. Try latest
> master branch
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now I'm on 486ace8ccbb [master]. How I should check
> choosen redundancy?
> >>>>>>>>>>>>>> # cat /mnt/test/vdi/list
> >>>>>>>>>>>>>> Name Id Size Used Shared Creation
> time VDI id
> >>>>>>>>>>>>>> Copies Tag
> >>>>>>>>>>>>>> testowy 0 1.0 GB 0.0 MB 0.0 MB 2014-01-04
> 15:07 cac836 3
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here I can see 3 copies, can't see info about how
> many parity strips
> >>>>>>>>>>>>>> is configured. Probably this isn't implemented yet?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Not yet. But currently you can 'dog cluster info -s'
> to see the global policy
> >>>>>>>>>>>>> scheme x:y (that you 'dog cluster format -c x:y').
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With erasure coding, 'copies' will have another
> meaning that the number of total
> >>>>>>>>>>>>> data + parity objects. In your case, it is 2+1=3. But
> as you said, this is
> >>>>>>>>>>>>> confusing, I think of adding a extra field to indicate
> redundancy scheme per vid.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Well, for about issue, I can't reproduce it. Could you
> give me more envronment
> >>>>>>>>>>>>> information such as 32 or 64 bits of your OS? what is
> your distro?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi!
> >>>>>>>>>>>> I'm using Gentoo 64bits, gcc version 4.7.3 (Gentoo
> Hardened 4.7.3-r1
> >>>>>>>>>>>> p1.4, pie-0.5.5), kernel 3.10 with Gentoo patches.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Does the problem still exist? I can't reproduce the
> issue yet. So how did you
> >>>>>>>>>>> reproduce it step by step?
> >>>>>>>>>>
> >>>>>>>>>> Hi!
> >>>>>>>>>> I'm installing sheepdog-0.7.x, next:
> >>>>>>>>>> # mkdir -p /mnt/sheep/{metadata,storage}
> >>>>>>>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
> >>>>>>>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
> >>>>>>>>>> # dog cluster format -c 2
> >>>>>>>>>> using backend plain store
> >>>>>>>>>> # dog vdi create testowy 5G
> >>>>>>>>>> # dog vdi check testowy
> >>>>>>>>>> PANIC: can't find next new idx
> >>>>>>>>>> dog exits unexpectedly (Aborted).
> >>>>>>>>>> dog() [0x4058da]
> >>>>>>>>>> [...]
> >>>>>>>>>>
> >>>>>>>>>> I'm getting SIGABRT on every try.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On the same machine, with master branch(not stable-0.7),
> you mentioned you can't
> >>>>>>>>> reproduce the problem?
> >>>>>>>>
> >>>>>>>> With master branch (commit a79e69f9ad9c5) I'm getting such
> message:
> >>>>>>>> # dog vdi check testowy
> >>>>>>>> PANIC: can't find a valid vnode
> >>>>>>>> dog exits unexpectedly (Aborted).
> >>>>>>>> dog() [0x4057fa]
> >>>>>>>> /lib64/libpthread.so.0(+0xfd8f) [0x7f6d43cd0d8f]
> >>>>>>>> /lib64/libc.so.6(gsignal+0x38) [0x7f6d43951368]
> >>>>>>>> /lib64/libc.so.6(abort+0x147) [0x7f6d439526c7]
> >>>>>>>> dog() [0x40336e]
> >>>>>>>> dog() [0x409d9f]
> >>>>>>>> dog() [0x40cea5]
> >>>>>>>> dog() [0x403927]
> >>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f6d4393dc04]
> >>>>>>>> dog() [0x403c6c]
> >>>>>>>>
> >>>>>>>> Will be full gdb backtrace usefull?
> >>>>>>>
> >>>>>>> Hmm, before you run 'dog vdi check', what is output of 'dog
> cluster info',
> >>>>>>> 'dog node list', 'dog node md info --all'?
> >>>>>>
> >>>>>> Output using master branch:
> >>>>>> # dog cluster info
> >>>>>> Cluster status: running, auto-recovery enabled
> >>>>>>
> >>>>>> Cluster created at Tue Jan 7 13:21:53 2014
> >>>>>>
> >>>>>> Epoch Time Version
> >>>>>> 2014-01-07 13:21:54 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >>>>>> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >>>>>>
> >>>>>> # dog node list
> >>>>>> Id Host:Port V-Nodes Zone
> >>>>>> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >>>>>> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >>>>>> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >>>>>> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >>>>>> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >>>>>>
> >>>>>> # dog node md info --all
> >>>>>> Id Size Used Avail Use% Path
> >>>>>> Node 0:
> >>>>>> 0 4.4 GB 4.0 MB 4.4 GB 0% /mnt/sheep/storage/0
> >>>>>> Node 1:
> >>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/1
> >>>>>> Node 2:
> >>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/2
> >>>>>> Node 3:
> >>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/3
> >>>>>> Node 4:
> >>>>>> 0 4.4 GB 0.0 MB 4.4 GB 0% /mnt/sheep/storage/4
> >>>>>>
> >>>>>
> >>>>> The very strange thing from your output is that only 1 copy
> was actually
> >>>>> written while you execute 'dog vdi create', but you formated
> the cluster with
> >>>>> two copy specified.
> >>>>>
> >>>>> You can verify this by
> >>>>>
> >>>>> ls /mnt/sheepdog/storage/*/
> >>>>>
> >>>>> I guess you can only see one object. Dunno why this happened.
> >>>>
> >>>> It is as you said:
> >>>> # ls /mnt/sheep/storage/*/
> >>>> /mnt/sheep/storage/0/:
> >>>> 80cac83600000000
> >>>>
> >>>> /mnt/sheep/storage/1/:
> >>>>
> >>>> /mnt/sheep/storage/2/:
> >>>>
> >>>> /mnt/sheep/storage/3/:
> >>>>
> >>>> /mnt/sheep/storage/4/:
> >>>>
> >>>>
> >>>> Now I'm on commit a79e69f9ad9c and problem still exists for me (in
> >>>> contrary to 0.7-stable). I noticed that in my /tmp appeared file
> >>>> "sheepdog_shm" and "lock" . Is it correct?
> >>>>
> >
> > lock isn't created by sheep daemon as far as I know. we create
> sheepdog_locks for
> > local driver.
> >
> >>>
> >>> I suspect there is only actually one node in the cluster so 'vdi
> check' panic out.
> >>>
> >>> before you run 'vdi check'
> >>>
> >>> for i in `seq 0 5`;do dog cluster info -p 700$i;done
> >>>
> >>> is every node output same?
> >>>
> >>>
> >>> for i in `seq 0 5`;do dog node list -p 700$i;done
> >>>
> >>> same too?
> >>
> >> Hi!
> >> Output is looks as below:
> >>
> >> # for i in `seq 0 4`;do dog cluster info -p 700$i;done
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan 8 09:42:40 2014
> >>
> >> Epoch Time Version
> >> 2014-01-08 09:42:41 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan 8 09:42:40 2014
> >>
> >> Epoch Time Version
> >> 2014-01-08 09:42:40 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan 8 09:42:40 2014
> >>
> >> Epoch Time Version
> >> 2014-01-08 09:42:41 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan 8 09:42:40 2014
> >>
> >> Epoch Time Version
> >> 2014-01-08 09:42:40 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >> Cluster status: running, auto-recovery enabled
> >>
> >> Cluster created at Wed Jan 8 09:42:40 2014
> >>
> >> Epoch Time Version
> >> 2014-01-08 09:42:40 1 [127.0.0.1:7000
> <http://127.0.0.1:7000>, 127.0.0.1:7001 <http://127.0.0.1:7001>,
> >> 127.0.0.1:7002 <http://127.0.0.1:7002>, 127.0.0.1:7003
> <http://127.0.0.1:7003>, 127.0.0.1:7004 <http://127.0.0.1:7004>]
> >>
> >> # for i in `seq 0 4`;do dog node list -p 700$i;done
> >> Id Host:Port V-Nodes Zone
> >> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >> Id Host:Port V-Nodes Zone
> >> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >> Id Host:Port V-Nodes Zone
> >> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >> Id Host:Port V-Nodes Zone
> >> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >> Id Host:Port V-Nodes Zone
> >> 0 127.0.0.1:7000 <http://127.0.0.1:7000> 128
> 16777343
> >> 1 127.0.0.1:7001 <http://127.0.0.1:7001> 128
> 16777343
> >> 2 127.0.0.1:7002 <http://127.0.0.1:7002> 128
> 16777343
> >> 3 127.0.0.1:7003 <http://127.0.0.1:7003> 128
> 16777343
> >> 4 127.0.0.1:7004 <http://127.0.0.1:7004> 128
> 16777343
> >>
> >>
> >
> > Everything looks fine. It is very weird. And with 5 nodes you just
> write 1 copy
> > succeed. I have no idea what happened and I can't reproduce the
> problem on my
> > local machine.
>
> I started only two sheeps and turned on debug log level on nodes. There
> is something suspect for me in master (port 7000) sheep.log:
> Jan 08 14:01:58 DEBUG [main] clear_client_info(826) connection seems to
> be dead
>
> I'm attaching logs from both sheeps.
>
> Marcin
>
>
--
xmpp (jabber): marcin [at] mejor.pl
www: http://blog.mejor.pl/
More information about the sheepdog-users
mailing list