[sheepdog-users] SIGABRT when doing: dog vdi check

Fri Jan 3 22:51:26 CET 2014

Hi!
I'm new on "sheep-run";) I'm starting to try sheepdog so probably I'm
doing many things wrongly.
I'm playing with sheepdog-0.7.6.

First problem (SIGABRT):
I started multi sheep daemeon on localhost:
# for x in 0 1 2 3 4; do sheep -c local -j size=128M -p 700$x
/mnt/sheep/metadata/$x,/mnt/sheep/storage/$x; done

Next:
# dog cluster info
Cluster status: Waiting for cluster to be formatted

# dog cluster format -c 2:1
using backend plain store
# dog cluster info
Cluster status: running, auto-recovery enabled

Cluster created at Fri Jan  3 20:33:43 2014

Epoch Time           Version
2014-01-03 20:33:43      1 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]
# dog vdi create testowy 5G
# gdb -q dog
Reading symbols from /usr/sbin/dog...Reading symbols from
/usr/lib64/debug/usr/sbin/dog.debug...done.
done.
(gdb)  set args  vdi check testowy
(gdb) run
Starting program: /usr/sbin/dog vdi check testowy
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
warning: File "/lib64/libthread_db-1.0.so" auto-loading has been
declined by your `auto-load safe-path' set to
"$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /lib64/libthread_db-1.0.so
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the
shell:
        info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library,
thread debugging will not be available.
PANIC: can't find next new idx

Program received signal SIGABRT, Aborted.
0x00007ffff784e2c5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff784e2c5 in raise () from /lib64/libc.so.6
#1  0x00007ffff784f748 in abort () from /lib64/libc.so.6
#2  0x000055555556192c in get_vnode_next_idx (nr_prev_idxs=<optimized
out>, prev_idxs=<optimized out>, nr_entries=<optimized out>,
    entries=<optimized out>) at ../include/sheep.h:105
#3  oid_to_vnodes (vnodes=0x7fffffffe0c0, nr_copies=2, oid=<optimized
out>, nr_entries=320, entries=<optimized out>)
    at ../include/sheep.h:174
#4  oid_to_vnodes (vnodes=0x7fffffffe0c0, nr_copies=2, oid=<optimized
out>, nr_entries=320, entries=<optimized out>) at vdi.c:1586
#5  queue_vdi_check_work (oid=<optimized out>, done=done at entry=0x0,
wq=wq at entry=0x555556321420, inode=0x7ffff6c13010, inode=0x7ffff6c13010)
    at vdi.c:1600
#6  0x00005555555632e8 in vdi_check (argc=<optimized out>,
argv=<optimized out>) at vdi.c:1634
#7  0x000055555555bc77 in main (argc=4, argv=0x7fffffffe308) at dog.c:436

Second problem :
Using previously created vdi I'm mounting sheepfs:
# sheepfs  /mnt/test/
Next:
# echo testowy >/mnt/test/vdi/mount
# mkfs.ext4 -q /mnt/test/volume/testowy
/mnt/test/volume/testowy is not a block special device.
Proceed anyway? (y,n) y
# mount -o noatime /mnt/test/volume/testowy /mnt/sheep_test/
# dd if=/dev/zero of=//mnt/sheep_test/zeroes bs=1M count=50
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0,108502 s, 483 MB/s

Next I'm stoping one sheep daemon, new situation is as below:
# dog cluster  info
Cluster status: running, auto-recovery enabled

Cluster created at Fri Jan  3 20:33:43 2014

Epoch Time           Version
2014-01-03 21:02:40      2 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003]
2014-01-03 20:33:43      1 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004]

Now my test files is unaccessible:
# md5sum /mnt/sheep_test/zeroes
md5sum: /mnt/sheep_test/zeroes: Input/output error

# cat /mnt/test/vdi/list
  Name        Id    Size    Used  Shared    Creation time   VDI id
Copies  Tag
  testowy      0  5.0 GB  272 MB  0.0 MB 2014-01-03 20:34   cac836     2

Shouldn't be my vdi "testowy" still be available even when one node is
down? (I'll attach sheep.log at the end of email.)

I'd like ask you for advice about proper (for my purposes) configuration
of sheep "cluster". I'd like to prepare one-node storage for keeping
backups. I'm going to use a few HDDs (from 2 to 5 units) (I think I need
to use "Multi disk on Single Node Support"). I'd like to have enough
redundancy to survive one HDD failure ( I'm thinking about using
"Erasure Code Support" and 2:1 or 4:1 redundancy). Also I'd like to have
flexibility of adding or removing HDD from sheepdog's cluster. (I think
that such kind of flexibility mentioned isn't possibly). After reading
wiki I think almost everything above is possible, am I right?

Should I use one daemon per node or multi sheeps on one node to do it?
(I think one daemon is enough but wiki says: "You need at least X alive
nodes (e.g, 4 nodes in 4:2 scheme) to serve the read/write request. If
number of nodes drops to below X, the cluster will deny of service. Note
that if you only have X nodes in the cluster, it means you don't have
any redundancy parity generated."
So I'm not sure if one or multi daemon mode I should configure.

Last question is about checksumming of data. Is it better to lay sheep
on ext4 and use btrfs/zfs on the VDI or lay sheep on btrfs and use ext4
on top of VDI?

Thanks!
Marcin