[sheepdog-users] [0.7-stable] SIGABRT when doing: dog vdi check

Hitoshi Mitake mitake.hitoshi at gmail.com
Tue Jan 7 15:32:08 CET 2014


At Tue, 07 Jan 2014 15:19:05 +0100,
Marcin Mirosław wrote:
> 
> W dniu 07.01.2014 14:19, Hitoshi Mitake pisze:
> > At Tue, 07 Jan 2014 10:45:33 +0100,
> > Marcin Mirosław wrote:
> >>
> >> W dniu 07.01.2014 02:57, Hitoshi Mitake pisze:
> >>> At Mon, 06 Jan 2014 11:41:16 +0900,
> >>> Hitoshi Mitake wrote:
> >>>>
> >>>> At Sat, 4 Jan 2014 13:28:27 +0800,
> >>>> Liu Yuan wrote:
> >>>>>
> >>>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin Mirosław wrote:
> >>>>>> Hi!
> >>>>>> I'm new on "sheep-run";) I'm starting to try sheepdog so probably I'm
> >>>>>> doing many things wrongly.
> >>>>>> I'm playing with sheepdog-0.7.6.
> >>>>>>
> >>>>>> First problem (SIGABRT):
> >>>>>> I started multi sheep daemeon on localhost:
> >>>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
> >>>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
> >>>>>>
> >>>>>> Next:
> >>>>>> # dog cluster info
> >>>>>> Cluster status: Waiting for cluster to be formatted
> >>>>>>
> >>>>>> # dog cluster format -c 2:1
> >>>>>
> >>>>> 0.7.6 doesn't support erasure code. Try latest master branch
> >>>>
> >>>> Current stable-0.7 doesn't treat copy policies like "2:1" as an
> >>>> error. It would be confusing for users. I'll write a patch only for
> >>>> stable-0.7 for treating these options as errors.
> >>>>
> >>>>>
> >>>>>> using backend plain store
> >>>>>> # dog cluster info
> >>>>>> Cluster status: running, auto-recovery enabled
> >>>>>>
> >>>>>> Cluster created at Fri Jan  3 20:33:43 2014
> >>>>>>
> >>>>>> Epoch Time           Version
> >>>>>> 2014-01-03 20:33:43      1 [127.0.0.1:7000, 127.0.0.1:7001,
> >>>>>> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
> >>>>>> # dog vdi create testowy 5G
> >>>>>> # gdb -q dog
> >>>>>> Reading symbols from /usr/sbin/dog...Reading symbols from
> >>>>>> /usr/lib64/debug/usr/sbin/dog.debug...done.
> >>>>>> done.
> >>>>>> (gdb)  set args  vdi check testowy
> >>>>>> (gdb) run
> >>>>>> Starting program: /usr/sbin/dog vdi check testowy
> >>>>>> warning: Could not load shared library symbols for linux-vdso.so.1.
> >>>>>> Do you need "set solib-search-path" or "set sysroot"?
> >>>>>> warning: File "/lib64/libthread_db-1.0.so" auto-loading has been
> >>>>>> declined by your `auto-load safe-path' set to
> >>>>>> "$debugdir:$datadir/auto-load".
> >>>>>> To enable execution of this file add
> >>>>>>         add-auto-load-safe-path /lib64/libthread_db-1.0.so
> >>>>>> line to your configuration file "/root/.gdbinit".
> >>>>>> To completely disable this security protection add
> >>>>>>         set auto-load safe-path /
> >>>>>> line to your configuration file "/root/.gdbinit".
> >>>>>> For more information about this security protection see the
> >>>>>> "Auto-loading safe path" section in the GDB manual.  E.g., run from the
> >>>>>> shell:
> >>>>>>         info "(gdb)Auto-loading safe path"
> >>>>>> warning: Unable to find libthread_db matching inferior's thread library,
> >>>>>> thread debugging will not be available.
> >>>>>> PANIC: can't find next new idx
> >>>>>
> >>>>> seems that 0.7.x series is cracky about it. Hitoshi, can you verify
> >>>>> this?
> >>>>
> >>>> OK, I'll dig in the problem soon.
> >>>
> >>> Hi Marcin,
> >>>
> >>> I've fixed a serious bug which might produce the bug you've
> >>> suffered. The commit is this:
> >>> https://github.com/sheepdog/sheepdog/commit/b82632b47978315a50e9ba0bbad59f56453f63f5
> >>>
> >>> The patch is already backported to stable-0.7. Could you try the
> >>> latest stable-0.7? You can obtain the source code by the below step:
> >>>
> >>> git clone https://github.com/sheepdog/sheepdog.git
> >>> cd sheepdog
> >>> git checkout -b 0.7 origin/stable-0.7
> >>>
> >>> But the problem might depend on timing heavily. So reproducing the
> >>> problem will be not so easy... If you have time, I'd like you to try
> >>> the latest change.
> >>
> >> Hi all!
> >> Today I'm trying sheepdog on other computer and don;t have any problem
> >> to reproduce SIGABRT. I tried latest 0.7 stable (at 9de5329978) and "dog
> >> vdi check testowy" still crashes:
> >> # dog  vdi check testowy
> >> PANIC: can't find next new idx
> >> dog exits unexpectedly (Aborted).
> >> dog() [0x40536a]
> >> /lib64/libpthread.so.0(+0xfd8f) [0x7f90aec31d8f]
> >> /lib64/libc.so.6(gsignal+0x38) [0x7f90ae8b2368]
> >> /lib64/libc.so.6(abort+0x147) [0x7f90ae8b36c7]
> >> dog() [0x40893a]
> >> dog() [0x40a3d9]
> >> dog() [0x40363e]
> >> /lib64/libc.so.6(__libc_start_main+0xf4) [0x7f90ae89ec04]
> >> dog() [0x4038a8]
> >> #7  0x000000000040faa5 in sd_backtrace () at logger.c:914
> >>
> >> addrs = {0x40f95d <sd_backtrace+29>, 0x40536b <crash_handler+43>,
> >> 0x7f90aec31d90 <__restore_rt>, 0x7f90ae8b2369 <__GI_raise+57>,
> >> 0x7f90ae8b36c8 <__GI_abort+328>, 0x40893b, 0x40a3da <vdi_check+186>,
> >> 0x40363f <main+1055>, 0x7f90ae89ec05 <__libc_start_main+2
> >> 45>, 0x4038a9 <_start+41>, 0x0 <repeats 1014 times>}
> >>
> >> i = <optimized out>
> >>
> >> n = <optimized out>
> >>
> >> __func__ = "sd_backtrace"
> >>
> >> #8  0x000000000040536b in crash_handler (signo=6) at dog.c:330
> >>
> >> __func__ = "crash_handler"
> >>
> >> #9  <signal handler called>
> >>
> >> No locals.
> >>
> >> #10 0x00007f90ae8b2369 in __GI_raise (sig=sig at entry=6) at
> >> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> >>
> >> resultvar = 0
> >>
> >> pid = 21938
> >>
> >> selftid = 21938
> >>
> >> #11 0x00007f90ae8b36c8 in __GI_abort () at abort.c:90
> >>
> >> save_stage = 2
> >>
> >> act = {__sigaction_handler = {sa_handler = 0x2, sa_sigaction = 0x2},
> >> sa_mask = {__val = {29029392, 102, 140259386245127, 5, 0, 208,
> >> 140259380182888, 0, 102, 320, 140259386270773, 0, 140259383961520, 0,
> >> 140259383961520, 140259383955552}}, sa_flags = -13587
> >> 68064, sa_restorer = 0x7f90ae9a3e80 <__memcpy_ssse3+9600>}
> >>
> >> sigs = {__val = {32, 0 <repeats 15 times>}}
> >>
> >> #12 0x000000000040893b in get_vnode_next_idx (nr_prev_idxs=<optimized
> >> out>, prev_idxs=<optimized out>, nr_entries=<optimized out>,
> >> entries=<optimized out>) at ../include/sheep.h:105
> >>
> >> i = <optimized out>
> >>
> >> idx = <optimized out>
> >>
> >> first_idx = <optimized out>
> >>
> >> found = <optimized out>
> >>
> >> #13 oid_to_vnodes (vnodes=0x7fffaefac8c0, nr_copies=2, oid=<optimized
> >> out>, nr_entries=320, entries=<optimized out>) at ../include/sheep.h:174
> >>
> >> idxs = {102, 0, 29029184, 0, 1, 0, 29029264, 0}
> >>
> >> i = <optimized out>
> >>
> >> vnodes = 0x7fffaefac8c0
> >>
> >> nr_copies = 2
> >>
> >> oid = <optimized out>
> >>
> >> nr_entries = 320
> >>
> >> #14 queue_vdi_check_work (oid=<optimized out>, done=done at entry=0x0,
> >> wq=wq at entry=0x1baf410, inode=0x7f90adc77010, inode=0x7f90adc77010) at
> >> vdi.c:1600
> >>
> >> info = 0x1baf580
> >>
> >> tgt_vnodes = {0xc2fd70 <sd_vnodes+5712>, 0x1baf3b8, 0x7f90adc77010,
> >> 0x414348 <create_worker_threads+104>, 0x7fffaefae207, 0x7f90adc76700,
> >> 0x1baf340, 0x1baf390}
> >>
> >> nr_copies = 2
> >>
> >> #15 0x000000000040a3da in vdi_check (argc=<optimized out>,
> >> argv=<optimized out>) at vdi.c:1634
> >>
> >> vdiname = 0x7fffaefae207 "testowy"
> >>
> >> ret = 0
> >>
> >> max_idx = <optimized out>
> >>
> >> done = 0
> >>
> >> vid = 13289526
> >>
> >> inode = 0x7f90adc77010
> >>
> >> wq = 0x1baf410
> >>
> >> __func__ = "vdi_check"
> >>
> >> #16 0x000000000040363f in main (argc=<optimized out>,
> >> argv=0x7fffaefacad8) at dog.c:494
> >>
> >> ch = <optimized out>
> >>
> >> longindex = 0
> >>
> >> ret = <optimized out>
> >>
> >> flags = 2
> >>
> >> long_options = 0xc2c560 <lopts.2390>
> >>
> >> commands = 0x1ba3010
> >>
> >> short_options = 0xc2e580 <sopts.2381"s:a:p:h"
> >>
> >> p = 0x41594d
> >> <__libc_csu_init+77"H\203\303\001H9\353u\352H\203\304\b[]A\\A]A^A_\303ff.\017\037\204"
> >>
> >> __func__ = "main"
> >>
> >>
> >> Additionally I'm getting QA Notice from portage:
> >>  * QA Notice: Package triggers severe warnings which indicate that it
> >>  *            may exhibit random runtime failures.
> >>  * group.c:348:45: warning: argument to ‘sizeof’ in ‘memcpy’ call is the
> >> same expression as the destination; did you mean to dereference it?
> >> [-Wsizeof-pointer-memaccess]
> >>
> >>
> >>
> >> Today it's gcc-4.8.2, but I tested with clang and it nothing changed.
> >> I'm configuring sheepdog using:
> >> # configure --prefix=/usr --build=x86_64-pc-linux-gnu
> >> --host=x86_64-pc-linux-gnu --mandir=/usr/share/man
> >> --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
> >> --localstatedir=/var/lib --libdir=/usr/lib64 --disable-silent-rules
> >> --disable-dependency-tracking --docdir=/usr/share/doc/sheepdog-0.7.9999
> >> --disable-corosync --enable-sheepfs
> >>
> >> Dir "/mnt/sheep" is placed on one partition (I know this is slow but
> >> speed isn't important for me at this moment).
> >> If I can do something more to help please let me know.
> > 
> > Thanks a lot for your testing and detailed report, Marcin.
> > 
> > I backported an important patch for the above problem. Could you try
> > the latest stable-0.7?
> > 
> > BTW, it depends on sizeof(time_t). Could you try the below program on
> > your environment?
> > 
> > #include <stdio.h>
> > #include <time.h>
> > 
> > int main(void)
> > {
> > 	printf("sizeof(time_t): %lu\n", sizeof(time_t));
> > 	printf("sizeof(time_t *): %lu\n", sizeof(time_t *));
> > 	return 0;
> > }
> > 
> > If the sizes are different, the patch would fix the problem.
> 
> Sizes are the same:
> 
> sizeof(time_t): 8
> sizeof(time_t *): 8

Hmm, thanks.

> 
> 
> I've tested sheepddog on commit 984afc61c16e69 and it's better:) :
> # dog vdi check testowy
> failed to connect to 0:0:7f00:1:591b:::0: Network is unreachable
> failed to connect to 0:0:7f00:1:591b:::0: Network is unreachable
>  13.6 % [==================>
> 
>  ] 696 MB / 5.0 GB    failed to connect to 5b1b:::0: Network is unreachable
> / # ============================================>
> 
> So SIGABRT didn't appear, only terminal was polluted in this test;)
> I run once again with defined address ip:
> # dog vdi check  -a 127.0.0.1 testowy
>  48.8 %
> [==================================================================>
>                                                                  ] 2.4
> GB / 5.0 GB    failed to connect to 5b1b:::0: Network is unreachable
> failed to connect to 5b1b:::0: Network is unreachable
>  48.8 %
> [==================================================================>
>                                                                  ] 2.4
> GB / 5.0 GB    failed to connect to 0:0:7f00:1:591b:::0: Network is
> unreachable
> failed to connect to 0:0:7f00:1:591b:::0: Network is unreachable
> 
> Don't know why dog wants connect to ipv6.

Thanks for your testing!

The broken IP addresses can be seen when the latest dog and stable
sheep are communicating. Aren't you using the latest dog?

Thanks,
Hitoshi





More information about the sheepdog-users mailing list