On Mon, Aug 19, 2013 at 04:28:34PM +0200, Valerio Pachera wrote: > Hi, I'm simulating a heavy I/O situation where I need also to take snapshots. > > Guest is running with cache enabled and virtio. > > On the host side I created script that takes a snapshot each 10 seconds: > > --- > while true; > do > collie vdi snapshot big > sleep 10 > done > --- > > > On guest side I try to create 1000 small files of 1M. > While these files are written, snapshots are taken. > > --- > for n in $(seq 1 1000); > do > dd if=/dev/zero of=c$n bs=1M count=1 > done > --- > > The result has been the crash of the node running qemu. > > In sheep.log i get first lots of > > ---- > Aug 19 16:07:52 ERROR [gway 29363] gateway_forward_request(315) fail > to write local 4fd25d00000000, No object found > Aug 19 16:07:52 ERROR [gway 29363] wait_forward_request(212) fail > 4fd25d00000000, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000000, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000000 fail, 2 > Aug 19 16:07:52 ERROR [gway 29361] gateway_forward_request(315) fail > to write local 4fd25d00000001, No object found > Aug 19 16:07:52 ERROR [gway 29361] wait_forward_request(212) fail > 4fd25d00000001, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000001, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000001 fail, 2 > Aug 19 16:07:52 ERROR [gway 29364] gateway_forward_request(315) fail > to write local 4fd25d00000002, No object found > Aug 19 16:07:52 ERROR [gway 29364] wait_forward_request(212) fail > 4fd25d00000002, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000002, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000002 fail, 2 > Aug 19 16:07:52 ERROR [gway 29363] gateway_forward_request(315) fail > to write local 4fd25d00000003, No object found > Aug 19 16:07:52 ERROR [gway 29363] wait_forward_request(212) fail > 4fd25d00000003, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000003, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000003 fail, 2 > Aug 19 16:07:52 ERROR [gway 29361] gateway_forward_request(315) fail > to write local 4fd25d00000004, No object found > Aug 19 16:07:52 ERROR [gway 29361] wait_forward_request(212) fail > 4fd25d00000004, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000004, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000004 fail, 2 > Aug 19 16:07:52 ERROR [gway 29364] wait_forward_request(212) fail > 4fd25d00000005, No object found > Aug 19 16:07:52 ERROR [gway 29364] wait_forward_request(212) fail > 4fd25d00000005, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000005, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000005 fail, 2 > Aug 19 16:07:52 ERROR [gway 29363] gateway_forward_request(315) fail > to write local 4fd25d00000006, No object found > Aug 19 16:07:52 ERROR [gway 29363] wait_forward_request(212) fail > 4fd25d00000006, No object found > Aug 19 16:07:52 ERROR [deletion] remove_object(452) failed to remove > object 4fd25d00000006, No object found > Aug 19 16:07:52 ERROR [deletion] delete_one(679) remove object > 4fd25d00000006 fail, 2 > Aug 19 16:07:52 ERROR [gway 29361] gateway_forward_request(315) fail > to write local 4fd25d00000007, No object found > Aug 19 16:07:52 ERROR [gway 29361] wait_forward_request(212) fail > 4fd25d00000007, No object found > ... > --- > > Then lot's of > > --- > Aug 19 16:08:25 ERROR [net 29092] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:08:25 ERROR [net 29092] rx_work(537) failed to read a header > Aug 19 16:08:25 ERROR [net 29092] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:08:25 ERROR [net 29092] rx_work(537) failed to read a header > Aug 19 16:08:25 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:08:25 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:08:25 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:08:25 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:12:57 ERROR [net 29097] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:57 ERROR [net 29097] rx_work(537) failed to read a header > Aug 19 16:12:57 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:57 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:12:57 ERROR [net 29092] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:57 ERROR [net 29092] rx_work(537) failed to read a header > Aug 19 16:12:57 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:57 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:12:58 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:58 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:12:58 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:12:58 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:12:58 ERROR [net 29092] do_read(220) connection is closed > (48 bytes left) > ... > ---- > > And in the end > > --- > Aug 19 16:13:38 ERROR [gway 29851] wait_forward_request(212) fail > 4fd26400000072, No object found > Aug 19 16:13:38 ERROR [oc_push 27558] push_cache_object(519) failed > to push object 4fd26400000072, No object found > Aug 19 16:13:38 EMERG [oc_push 27558] do_push_object(901) PANIC: push > failed but should never fail > Aug 19 16:13:38 EMERG [oc_push 27558] crash_handler(250) sheep exits > unexpectedly (Aborted). > Aug 19 16:13:38 ERROR [net 29784] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29784] rx_work(537) failed to read a header > Aug 19 16:13:38 ERROR [net 29784] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29784] rx_work(537) failed to read a header > Aug 19 16:13:38 ERROR [net 29096] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29096] rx_work(537) failed to read a header > Aug 19 16:13:38 ERROR [net 29097] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29097] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(843) sheep.c:252: > crash_handler > Aug 19 16:13:38 ERROR [net 29807] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29807] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(857) > /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f508d6ca02f] > Aug 19 16:13:38 ERROR [net 29808] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29808] rx_work(537) failed to read a header > Aug 19 16:13:38 ERROR [net 29784] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29784] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(857) > /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7f508ccd6474] > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(857) > /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7f508ccd96ef] > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(843) > object_cache.c:901: do_push_object > Aug 19 16:13:38 ERROR [net 29807] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29807] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(843) work.c:294: > worker_routine > Aug 19 16:13:38 ERROR [net 29808] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29808] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(857) > /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b4f) [0x7f508d6c1b4f] > Aug 19 16:13:38 ERROR [net 29807] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29807] rx_work(537) failed to read a header > Aug 19 16:13:38 EMERG [oc_push 27558] sd_backtrace(857) > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f508cd7ea7c] > Aug 19 16:13:38 ERROR [net 29097] do_read(220) connection is closed > (48 bytes left) > Aug 19 16:13:38 ERROR [net 29097] rx_work(537) failed to read a header > Aug 19 16:13:39 ERROR [main] crash_handler(490) sheep pid 24464 > exited unexpectedly. > --- > > Here are the snapshot I was able to take: > > # collie vdi list > Name Id Size Used Shared Creation time VDI id Copies Tag > s big 1 50 GB 44 MB 0.0 MB 2013-08-19 16:08 4fd259 > 2 > s big 2 50 GB 0.0 MB 44 MB 2013-08-19 16:12 4fd25e > 2 > s big 3 50 GB 0.0 MB 44 MB 2013-08-19 16:12 4fd25f > 2 > s big 4 50 GB 220 MB 40 MB 2013-08-19 16:12 4fd260 > 2 > s big 5 50 GB 40 MB 252 MB 2013-08-19 16:13 4fd261 > 2 > s big 6 50 GB 8.0 MB 284 MB 2013-08-19 16:13 4fd262 > 2 > s big 7 50 GB 448 MB 40 MB 2013-08-19 16:13 4fd263 > 2 > big 0 50 GB 20 MB 496 MB 2013-08-19 16:13 4fd264 2 > > PS: I notice now I used collie instead of dog. > Is that a problem? > > sheep 0.7.0_5_g24cf669 > qemu 1.6.50 Could you go back to v0.7.0 to see if the problem still exist? $ git checkout -b test v0.7.0 # this will checkout into a test branch with v0.7.0 Thanks Yuan |