[sheepdog-users] Snapshot and Cache stress test

Liu Yuan namei.unix at gmail.com
Tue Aug 20 05:36:49 CEST 2013


On Mon, Aug 19, 2013 at 04:28:34PM +0200, Valerio Pachera wrote:
> Hi, I'm simulating a heavy I/O situation where I need also to take snapshots.
> 
> Guest is running with cache enabled and virtio.
> 
> On the host side I created  script that takes a snapshot each 10 seconds:
> 
> ---
> while true;
> do
>         collie vdi snapshot big
>         sleep 10
> done
> ---
> 
> 
> On guest side I try to create 1000 small files of 1M.
> While these files are written, snapshots are taken.
> 
> ---
> for n in $(seq 1 1000);
> do
>   dd if=/dev/zero of=c$n bs=1M count=1
> done
> ---
> 
> The result has been the crash of the node running qemu.
> 
> In sheep.log i get first lots of
> 
> ----
> Aug 19 16:07:52  ERROR [gway 29363] gateway_forward_request(315) fail
> to write local 4fd25d00000000, No object found
> Aug 19 16:07:52  ERROR [gway 29363] wait_forward_request(212) fail
> 4fd25d00000000, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000000, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000000 fail, 2
> Aug 19 16:07:52  ERROR [gway 29361] gateway_forward_request(315) fail
> to write local 4fd25d00000001, No object found
> Aug 19 16:07:52  ERROR [gway 29361] wait_forward_request(212) fail
> 4fd25d00000001, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000001, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000001 fail, 2
> Aug 19 16:07:52  ERROR [gway 29364] gateway_forward_request(315) fail
> to write local 4fd25d00000002, No object found
> Aug 19 16:07:52  ERROR [gway 29364] wait_forward_request(212) fail
> 4fd25d00000002, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000002, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000002 fail, 2
> Aug 19 16:07:52  ERROR [gway 29363] gateway_forward_request(315) fail
> to write local 4fd25d00000003, No object found
> Aug 19 16:07:52  ERROR [gway 29363] wait_forward_request(212) fail
> 4fd25d00000003, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000003, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000003 fail, 2
> Aug 19 16:07:52  ERROR [gway 29361] gateway_forward_request(315) fail
> to write local 4fd25d00000004, No object found
> Aug 19 16:07:52  ERROR [gway 29361] wait_forward_request(212) fail
> 4fd25d00000004, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000004, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000004 fail, 2
> Aug 19 16:07:52  ERROR [gway 29364] wait_forward_request(212) fail
> 4fd25d00000005, No object found
> Aug 19 16:07:52  ERROR [gway 29364] wait_forward_request(212) fail
> 4fd25d00000005, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000005, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000005 fail, 2
> Aug 19 16:07:52  ERROR [gway 29363] gateway_forward_request(315) fail
> to write local 4fd25d00000006, No object found
> Aug 19 16:07:52  ERROR [gway 29363] wait_forward_request(212) fail
> 4fd25d00000006, No object found
> Aug 19 16:07:52  ERROR [deletion] remove_object(452) failed to remove
> object 4fd25d00000006, No object found
> Aug 19 16:07:52  ERROR [deletion] delete_one(679) remove object
> 4fd25d00000006 fail, 2
> Aug 19 16:07:52  ERROR [gway 29361] gateway_forward_request(315) fail
> to write local 4fd25d00000007, No object found
> Aug 19 16:07:52  ERROR [gway 29361] wait_forward_request(212) fail
> 4fd25d00000007, No object found
> ...
> ---
> 
> Then lot's of
> 
> ---
> Aug 19 16:08:25  ERROR [net 29092] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:08:25  ERROR [net 29092] rx_work(537) failed to read a header
> Aug 19 16:08:25  ERROR [net 29092] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:08:25  ERROR [net 29092] rx_work(537) failed to read a header
> Aug 19 16:08:25  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:08:25  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:08:25  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:08:25  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:12:57  ERROR [net 29097] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:57  ERROR [net 29097] rx_work(537) failed to read a header
> Aug 19 16:12:57  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:57  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:12:57  ERROR [net 29092] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:57  ERROR [net 29092] rx_work(537) failed to read a header
> Aug 19 16:12:57  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:57  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:12:58  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:58  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:12:58  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:12:58  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:12:58  ERROR [net 29092] do_read(220) connection is closed
> (48 bytes left)
> ...
> ----
> 
> And in the end
> 
> ---
> Aug 19 16:13:38  ERROR [gway 29851] wait_forward_request(212) fail
> 4fd26400000072, No object found
> Aug 19 16:13:38  ERROR [oc_push 27558] push_cache_object(519) failed
> to push object 4fd26400000072, No object found
> Aug 19 16:13:38  EMERG [oc_push 27558] do_push_object(901) PANIC: push
> failed but should never fail
> Aug 19 16:13:38  EMERG [oc_push 27558] crash_handler(250) sheep exits
> unexpectedly (Aborted).
> Aug 19 16:13:38  ERROR [net 29784] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29784] rx_work(537) failed to read a header
> Aug 19 16:13:38  ERROR [net 29784] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29784] rx_work(537) failed to read a header
> Aug 19 16:13:38  ERROR [net 29096] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29096] rx_work(537) failed to read a header
> Aug 19 16:13:38  ERROR [net 29097] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29097] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(843) sheep.c:252:
> crash_handler
> Aug 19 16:13:38  ERROR [net 29807] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29807] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xf02f) [0x7f508d6ca02f]
> Aug 19 16:13:38  ERROR [net 29808] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29808] rx_work(537) failed to read a header
> Aug 19 16:13:38  ERROR [net 29784] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29784] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7f508ccd6474]
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libc.so.6(abort+0x17f) [0x7f508ccd96ef]
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(843)
> object_cache.c:901: do_push_object
> Aug 19 16:13:38  ERROR [net 29807] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29807] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(843) work.c:294:
> worker_routine
> Aug 19 16:13:38  ERROR [net 29808] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29808] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b4f) [0x7f508d6c1b4f]
> Aug 19 16:13:38  ERROR [net 29807] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29807] rx_work(537) failed to read a header
> Aug 19 16:13:38  EMERG [oc_push 27558] sd_backtrace(857)
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6c) [0x7f508cd7ea7c]
> Aug 19 16:13:38  ERROR [net 29097] do_read(220) connection is closed
> (48 bytes left)
> Aug 19 16:13:38  ERROR [net 29097] rx_work(537) failed to read a header
> Aug 19 16:13:39  ERROR [main] crash_handler(490) sheep pid 24464
> exited unexpectedly.
> ---
> 
> Here are the snapshot I was able to take:
> 
> # collie vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> s big          1   50 GB   44 MB  0.0 MB 2013-08-19 16:08   4fd259
> 2
> s big          2   50 GB  0.0 MB   44 MB 2013-08-19 16:12   4fd25e
> 2
> s big          3   50 GB  0.0 MB   44 MB 2013-08-19 16:12   4fd25f
> 2
> s big          4   50 GB  220 MB   40 MB 2013-08-19 16:12   4fd260
> 2
> s big          5   50 GB   40 MB  252 MB 2013-08-19 16:13   4fd261
> 2
> s big          6   50 GB  8.0 MB  284 MB 2013-08-19 16:13   4fd262
> 2
> s big          7   50 GB  448 MB   40 MB 2013-08-19 16:13   4fd263
> 2
>   big          0   50 GB   20 MB  496 MB 2013-08-19 16:13   4fd264     2
> 
> PS: I notice now I used collie instead of dog.
> Is that a problem?
> 
> sheep 0.7.0_5_g24cf669
> qemu 1.6.50

Could you go back to v0.7.0 to see if the problem still exist?

$ git checkout -b test v0.7.0 # this will checkout into a test branch with v0.7.0

Thanks
Yuan



More information about the sheepdog-users mailing list