[sheepdog] [sheepdog-users] [ANNOUNCE] sheepdog stable release v0.7.3-rc0

Sun Sep 8 09:57:36 CEST 2013

More information. I tried to start the sheep again and I got:

....
Sep 08 09:48:06   INFO [main] replay_journal_entry(156) /var/lib/sheepdog//disc2/data/000b30f500001055, size 4194304, off 0, 1
Sep 08 09:48:13   INFO [main] replay_journal_entry(156) /var/lib/sheepdog//disc2/data/000b30f500001058, size 4194304, off 0, 1
Sep 08 09:48:14  ERROR [main] for_each_object_in_stale(383) /var/lib/sheepdog//disc1/data/.stale
Sep 08 09:48:14  EMERG [main] crash_handler(250) sheep exits unexpectedly (Aborted).

The both sheep processes are still running, but dog will hangs when it tries to do anything. I tried to restart again, but I still get the following and dog hangs:

Sep 08 09:52:25   INFO [main] md_add_disk(141) /var/lib/sheepdog//disc1/data, nr 1
Sep 08 09:52:25   INFO [main] md_add_disk(141) /var/lib/sheepdog//disc2/data, nr 2
Sep 08 09:52:25   INFO [main] send_join_request(770) IPv4 ip:1.2.3.4 port:7000
Sep 08 09:52:28  ERROR [main] for_each_object_in_stale(383) /var/lib/sheepdog//disc1/data/.stale
Sep 08 09:52:28  EMERG [main] crash_handler(250) sheep exits unexpectedly (Aborted).

After this dog hangs, but the second node still shows two active nodes and both sheep processes are still running on the machine that has failed.

Regards

Gerald

> -----Ursprüngliche Nachricht-----
> Von: Gerald Richter Im Auftrag von Gerald Richter - ECOS
> Gesendet: Sonntag, 8. September 2013 09:47
> An: 'Hitoshi Mitake'; 'sheepdog-users at lists.wpkg.org';
> 'sheepdog at lists.wpkg.org'
> Betreff: AW: [sheepdog-users] [ANNOUNCE] sheepdog stable release
> v0.7.3-rc0
> 
> Hi,
> 
> regarding the segfault I mentioned on Friday, there are two nodes, format
> was with copies = 3 and I am using corosync. I don't have much more
> information anymore, because I had to import data on the weekend.
> 
> During the data import sheep crashed again. It crashed when I had two
> qemu-img convert running at the same time (of course with different vdi's)
> on the same machine. I still using 2 nodes. The network between the two
> node is only 100Mbit/s, so It got from time to time these poll timeout, but
> which wasn't a problem before. Here is the output of sheep.log (the second
> sheep on the machine were no import happen is still running).
> 
> Regards
> 
> Gerald
> 
> Sep 07 21:50:56   WARN [gway 130685] wait_forward_request(177) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait again
> Sep 07 21:51:00   WARN [gway 130746] wait_forward_request(177) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait again
> Sep 07 21:51:01   WARN [gway 130685] wait_forward_request(177) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait again
> Sep 07 21:51:01   WARN [gway 130200] wait_forward_request(177) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait again
> Sep 07 21:58:50  EMERG [gway 131160] crash_handler(250) sheep exits
> unexpectedly (Segmentation fault).
> Sep 07 21:58:52  EMERG [gway 131160] sd_backtrace(843) sheep.c:252:
> crash_handler Sep 07 21:58:52  EMERG [gway 131160] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7fda93b9d02f] Sep 07
> 21:58:52  EMERG [gway 131160] sd_backtrace(857) /lib/x86_64-linux-
> gnu/libc.so.6(+0x83fa3) [0x7fda931fafa3] Sep 07 21:58:52  EMERG [gway
> 131160] sd_backtrace(843) bitops.h:46: alloc_bitmap Sep 07 21:58:52  EMERG
> [gway 131160] sd_backtrace(857) /lib/x86_64-linux-
> gnu/libpthread.so.0(+0x6b4f) [0x7fda93b94b4f] Sep 07 21:58:52  EMERG
> [gway 131160] sd_backtrace(857) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c)
> [0x7fda93251a7c] Sep 07 21:59:01  EMERG [gway 131160]
> dump_stack_frames(790) #6  0x000000000041f4a5 in sd_backtrace () at
> logger.c:862
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) 862
> 	dump_stack_frames();
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) addrs =
> {0x41f36d, 0x4058b8, 0x7fda93b9d030, 0x7fda931fafa4, 0x423faf,
> 0x7fda93b94b50, 0x7fda93251a7d, 0x0 <repeats 1017 times>}
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) i =
> <optimized out>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) n =
> <optimized out>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) __func__ =
> "sd_backtrace"
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #7
> 0x00000000004058b8 in crash_handler (signo=11) at sheep.c:252
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) 252
> 	sd_backtrace();
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) __func__ =
> "crash_handler"
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #8  <signal
> handler called>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) No symbol
> table info available.
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #9
> 0x00007fda931fafa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) No symbol
> table info available.
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #10
> 0x0000000000423faf in alloc_bitmap (new_bits=262144, old_bits=<optimized
> out>, old_bmap=<optimized out>) at ../include/bitops.h:46
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) 46
> 	memset(new_bmap + old_size, 0, new_size - old_size);
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) old_size =
> <optimized out>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) new_size =
> 32768
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) new_bmap
> = <optimized out>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #11
> worker_routine (arg=0x1b90420) at work.c:264
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) 264
> 	tid_map = alloc_bitmap(tid_map, old_tid_max, tid_max);
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) wi =
> 0x1b90420
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) work =
> <optimized out>
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) tid = 131160
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) __func__ =
> "worker_routine"
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #12
> 0x00007fda93b94b50 in start_thread () from /lib/x86_64-linux-
> gnu/libpthread.so.0
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) No symbol
> table info available.
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #13
> 0x00007fda93251a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) No symbol
> table info available.
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(790) #14
> 0x0000000000000000 in ?? ()
> 
> Sep 07 21:59:01  EMERG [gway 131160] dump_stack_frames(804) No symbol
> table info available.
> 
> Sep 07 21:59:01  EMERG [gway 131160] __sd_dump_variable(731) dump
> __sys Sep 07 21:59:01  EMERG [gway 131160] __sd_dump_variable(734) $1 =
> {cdrv = 0x63a520, cdrv_option = 0x0, this_node = {nid = {addr = '\000'
> <repeats 12 times>"\260, \to\222", port = 7000, io_addr = '\000' <repeats 15
> times>, io_port = 0, pad = "\000\000\000"}, nr_vnodes = 64, zone =
> 2456750512, space = 4099542974464}, Sep 07 21:59:01  EMERG [gway 131160]
> __sd_dump_variable(739)  cinfo = {proto_ver = 8 '\b', disable_recovery = 0
> '\000', nr_nodes = 2, epoch = 3, ctime = 5920862273014739504, flags = 1,
> nr_copies = 3 '\003', status = SD_STATUS_OK, __pad = 0, store =
> "plain\000\000\000\000\000\000\000\000\000\000", nodes = {{nid = {a Sep 07
> 21:59:01  EMERG [gway 131160] __sd_dump_variable(739) ddr = '\000'
> <repeats 12 times>"\260, \to\222", port = 7000, io_addr = '\000' <repeats 15
> times>, io_port = 0, pad = "\000\000\000"}, nr_vnodes = 56, zone =
> 2456750512, space = 4099542974464}, {nid = {addr = '\000' <repeats 12
> times>"\260, \tzO", port = 7 Sep 07 21:59:01  EMERG [gway 131160]
> __sd_dump_variable(739) 000, io_addr = '\000' <repeats 15 times>, io_port =
> 0, pad = "\000\000\000"}, nr_vnodes = 72, zone = 1333397936, space =
> 5322119831552}, {nid = {addr = '\000' <repeats 15 times>, port = 0, io_addr =
> '\000' <repeats 15 times>, io_port = 0, pad = "\000\000\ Sep 07 21:59:01
> EMERG [gway 131160] __sd_dump_variable(739) 000"}, nr_vnodes = 0, zone
> = 0, space = 0} <repeats 1022 times>}}, disk_space = 4099542974464,
> vdi_inuse = {0 <repeats 11459 times>, 9007199254740992, 0 <repeats 15087
> times>, 1099511627776, 0 <repeats 74397 times>, 18014398509481984, 0
> <repeats 1792 time Sep 07 21:59:01  EMERG [gway 131160]
> __sd_dump_variable(739) s>, 262144, 0 <repeats 24409 times>,
> 137438953472, 0 <repeats 127514 times>, 131072, 0 <repeats 2506 times>,
> 281474976710656, 0 <repeats 4973 times>}, local_req_efd = 11,
> local_req_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers =
> 0, __ki Sep 07 21:59:01  EMERG [gway 131160] __sd_dump_variable(739) nd
> = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000'
> <repeats 39 times>, __align = 0}, local_req_queue = {next = 0x84a848, prev =
> 0x84a848}, req_wait_queue = {next = 0x84a858, prev = 0x84a858},
> nr_outstanding_reqs = 1, gateway_only Sep 07 21:59:01  EMERG [gway
> 131160] __sd_dump_variable(739)  = false, nosync = false, gateway_wqueue
> = 0x1b904f0, io_wqueue = 0x1b90730, deletion_wqueue = 0x1b90bb0,
> recovery_wqueue = 0x1b90970, recovery_notify_wqueue = 0x0,
> block_wqueue = 0x1b90df0, oc_reclaim_wqueue = 0x1b91270,
> oc_push_wqueue = 0x1b8f5d0, md_wq Sep 07 21:59:01  EMERG [gway
> 131160] __sd_dump_variable(739) ueue = 0x1b91030, enable_object_cache
> = true, object_cache_size = 100000, object_cache_directio = false,
> use_journal = {val = 1}, backend_dio = false, upgrade = false}
> 
> Sep 07 21:59:02  ERROR [main] crash_handler(490) sheep pid 109277 exited
> unexpectedly.
> 
> > -----Ursprüngliche Nachricht-----
> > Von: sheepdog-users-bounces at lists.wpkg.org [mailto:sheepdog-users-
> > bounces at lists.wpkg.org] Im Auftrag von Hitoshi Mitake
> > Gesendet: Freitag, 6. September 2013 16:59
> > An: sheepdog-users at lists.wpkg.org; sheepdog at lists.wpkg.org
> > Betreff: [sheepdog-users] [ANNOUNCE] sheepdog stable release
> > v0.7.3-rc0
> >
> > Hi sheepdog users and developers,
> >
> > I released v0.7.3-rc0 of stable branch. You can download a source
> > archive from these URLs:
> > tar.gz: https://github.com/sheepdog/sheepdog/archive/v0.7.3-rc0.tar.gz
> > zip: https://github.com/sheepdog/sheepdog/archive/v0.7.3-rc0.zip
> >
> > The most important updates of this release are:
> >  - some bugfixes for vdi deletion process
> >  - prevent losing vdi information at cluster initialization sequence
> >  - remove possibility of segfault in main event loop
> >
> > If no one complains about this release in 2 days, it will be v0.7.3 officialy.
> >
> > Below is the summary of commits this release contains.
> >
> > Hitoshi Mitake (5):
> >       tests/functional: let check clean directories of passed tests in default
> >       tests/functional: unmount loopback devices before cleaning directories
> >       sheep: make the vid deletion proceduer correct order
> >       sheep: initialize vdi bitmap after completion of reading inode object
> >       sheep: set bit in vdi_inuse in atomic manner
> >
> > MORITA Kazutaka (3):
> >       sheep: don't remove vdi object from object list cache
> >       sheep: wait until there is no get_vdis work in wait_get_vdis_done()
> >       event: refresh event info after unregistering
> >
> > Robin Dong (1):
> >       fix error-io in sheepfs when using ext4 filesystem
> >
> >
> > Thanks,
> > Hitoshi
> > --
> > sheepdog-users mailing lists
> > sheepdog-users at lists.wpkg.org
> > http://lists.wpkg.org/mailman/listinfo/sheepdog-users