[sheepdog-users] SIGABRT when doing: dog vdi check

Marcin Mirosław marcin at mejor.pl
Tue Jan 7 10:45:33 CET 2014


W dniu 07.01.2014 02:57, Hitoshi Mitake pisze:
> At Mon, 06 Jan 2014 11:41:16 +0900,
> Hitoshi Mitake wrote:
>>
>> At Sat, 4 Jan 2014 13:28:27 +0800,
>> Liu Yuan wrote:
>>>
>>> On Fri, Jan 03, 2014 at 10:51:26PM +0100, Marcin Mirosław wrote:
>>>> Hi!
>>>> I'm new on "sheep-run";) I'm starting to try sheepdog so probably I'm
>>>> doing many things wrongly.
>>>> I'm playing with sheepdog-0.7.6.
>>>>
>>>> First problem (SIGABRT):
>>>> I started multi sheep daemeon on localhost:
>>>> # for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
>>>> /mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done
>>>>
>>>> Next:
>>>> # dog cluster info
>>>> Cluster status: Waiting for cluster to be formatted
>>>>
>>>> # dog cluster format -c 2:1
>>>
>>> 0.7.6 doesn't support erasure code. Try latest master branch
>>
>> Current stable-0.7 doesn't treat copy policies like "2:1" as an
>> error. It would be confusing for users. I'll write a patch only for
>> stable-0.7 for treating these options as errors.
>>
>>>
>>>> using backend plain store
>>>> # dog cluster info
>>>> Cluster status: running, auto-recovery enabled
>>>>
>>>> Cluster created at Fri Jan  3 20:33:43 2014
>>>>
>>>> Epoch Time           Version
>>>> 2014-01-03 20:33:43      1 [127.0.0.1:7000, 127.0.0.1:7001,
>>>> 127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
>>>> # dog vdi create testowy 5G
>>>> # gdb -q dog
>>>> Reading symbols from /usr/sbin/dog...Reading symbols from
>>>> /usr/lib64/debug/usr/sbin/dog.debug...done.
>>>> done.
>>>> (gdb)  set args  vdi check testowy
>>>> (gdb) run
>>>> Starting program: /usr/sbin/dog vdi check testowy
>>>> warning: Could not load shared library symbols for linux-vdso.so.1.
>>>> Do you need "set solib-search-path" or "set sysroot"?
>>>> warning: File "/lib64/libthread_db-1.0.so" auto-loading has been
>>>> declined by your `auto-load safe-path' set to
>>>> "$debugdir:$datadir/auto-load".
>>>> To enable execution of this file add
>>>>         add-auto-load-safe-path /lib64/libthread_db-1.0.so
>>>> line to your configuration file "/root/.gdbinit".
>>>> To completely disable this security protection add
>>>>         set auto-load safe-path /
>>>> line to your configuration file "/root/.gdbinit".
>>>> For more information about this security protection see the
>>>> "Auto-loading safe path" section in the GDB manual.  E.g., run from the
>>>> shell:
>>>>         info "(gdb)Auto-loading safe path"
>>>> warning: Unable to find libthread_db matching inferior's thread library,
>>>> thread debugging will not be available.
>>>> PANIC: can't find next new idx
>>>
>>> seems that 0.7.x series is cracky about it. Hitoshi, can you verify
>>> this?
>>
>> OK, I'll dig in the problem soon.
> 
> Hi Marcin,
> 
> I've fixed a serious bug which might produce the bug you've
> suffered. The commit is this:
> https://github.com/sheepdog/sheepdog/commit/b82632b47978315a50e9ba0bbad59f56453f63f5
> 
> The patch is already backported to stable-0.7. Could you try the
> latest stable-0.7? You can obtain the source code by the below step:
> 
> git clone https://github.com/sheepdog/sheepdog.git
> cd sheepdog
> git checkout -b 0.7 origin/stable-0.7
> 
> But the problem might depend on timing heavily. So reproducing the
> problem will be not so easy... If you have time, I'd like you to try
> the latest change.

Hi all!
Today I'm trying sheepdog on other computer and don;t have any problem
to reproduce SIGABRT. I tried latest 0.7 stable (at 9de5329978) and "dog
vdi check testowy" still crashes:
# dog  vdi check testowy
PANIC: can't find next new idx
dog exits unexpectedly (Aborted).
dog() [0x40536a]
/lib64/libpthread.so.0(+0xfd8f) [0x7f90aec31d8f]
/lib64/libc.so.6(gsignal+0x38) [0x7f90ae8b2368]
/lib64/libc.so.6(abort+0x147) [0x7f90ae8b36c7]
dog() [0x40893a]
dog() [0x40a3d9]
dog() [0x40363e]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x7f90ae89ec04]
dog() [0x4038a8]
#7  0x000000000040faa5 in sd_backtrace () at logger.c:914

addrs = {0x40f95d <sd_backtrace+29>, 0x40536b <crash_handler+43>,
0x7f90aec31d90 <__restore_rt>, 0x7f90ae8b2369 <__GI_raise+57>,
0x7f90ae8b36c8 <__GI_abort+328>, 0x40893b, 0x40a3da <vdi_check+186>,
0x40363f <main+1055>, 0x7f90ae89ec05 <__libc_start_main+2
45>, 0x4038a9 <_start+41>, 0x0 <repeats 1014 times>}

i = <optimized out>

n = <optimized out>

__func__ = "sd_backtrace"

#8  0x000000000040536b in crash_handler (signo=6) at dog.c:330

__func__ = "crash_handler"

#9  <signal handler called>

No locals.

#10 0x00007f90ae8b2369 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56

resultvar = 0

pid = 21938

selftid = 21938

#11 0x00007f90ae8b36c8 in __GI_abort () at abort.c:90

save_stage = 2

act = {__sigaction_handler = {sa_handler = 0x2, sa_sigaction = 0x2},
sa_mask = {__val = {29029392, 102, 140259386245127, 5, 0, 208,
140259380182888, 0, 102, 320, 140259386270773, 0, 140259383961520, 0,
140259383961520, 140259383955552}}, sa_flags = -13587
68064, sa_restorer = 0x7f90ae9a3e80 <__memcpy_ssse3+9600>}

sigs = {__val = {32, 0 <repeats 15 times>}}

#12 0x000000000040893b in get_vnode_next_idx (nr_prev_idxs=<optimized
out>, prev_idxs=<optimized out>, nr_entries=<optimized out>,
entries=<optimized out>) at ../include/sheep.h:105

i = <optimized out>

idx = <optimized out>

first_idx = <optimized out>

found = <optimized out>

#13 oid_to_vnodes (vnodes=0x7fffaefac8c0, nr_copies=2, oid=<optimized
out>, nr_entries=320, entries=<optimized out>) at ../include/sheep.h:174

idxs = {102, 0, 29029184, 0, 1, 0, 29029264, 0}

i = <optimized out>

vnodes = 0x7fffaefac8c0

nr_copies = 2

oid = <optimized out>

nr_entries = 320

#14 queue_vdi_check_work (oid=<optimized out>, done=done at entry=0x0,
wq=wq at entry=0x1baf410, inode=0x7f90adc77010, inode=0x7f90adc77010) at
vdi.c:1600

info = 0x1baf580

tgt_vnodes = {0xc2fd70 <sd_vnodes+5712>, 0x1baf3b8, 0x7f90adc77010,
0x414348 <create_worker_threads+104>, 0x7fffaefae207, 0x7f90adc76700,
0x1baf340, 0x1baf390}

nr_copies = 2

#15 0x000000000040a3da in vdi_check (argc=<optimized out>,
argv=<optimized out>) at vdi.c:1634

vdiname = 0x7fffaefae207 "testowy"

ret = 0

max_idx = <optimized out>

done = 0

vid = 13289526

inode = 0x7f90adc77010

wq = 0x1baf410

__func__ = "vdi_check"

#16 0x000000000040363f in main (argc=<optimized out>,
argv=0x7fffaefacad8) at dog.c:494

ch = <optimized out>

longindex = 0

ret = <optimized out>

flags = 2

long_options = 0xc2c560 <lopts.2390>

commands = 0x1ba3010

short_options = 0xc2e580 <sopts.2381"s:a:p:h"

p = 0x41594d
<__libc_csu_init+77"H\203\303\001H9\353u\352H\203\304\b[]A\\A]A^A_\303ff.\017\037\204"

__func__ = "main"


Additionally I'm getting QA Notice from portage:
 * QA Notice: Package triggers severe warnings which indicate that it
 *            may exhibit random runtime failures.
 * group.c:348:45: warning: argument to ‘sizeof’ in ‘memcpy’ call is the
same expression as the destination; did you mean to dereference it?
[-Wsizeof-pointer-memaccess]



Today it's gcc-4.8.2, but I tested with clang and it nothing changed.
I'm configuring sheepdog using:
# configure --prefix=/usr --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --mandir=/usr/share/man
--infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
--localstatedir=/var/lib --libdir=/usr/lib64 --disable-silent-rules
--disable-dependency-tracking --docdir=/usr/share/doc/sheepdog-0.7.9999
--disable-corosync --enable-sheepfs

Dir "/mnt/sheep" is placed on one partition (I know this is slow but
speed isn't important for me at this moment).
If I can do something more to help please let me know.

Marcin





More information about the sheepdog-users mailing list