[sheepdog-users] Snapshot and Cache stress test

Tue Dec 17 17:11:33 CET 2013

On Tue, Dec 17, 2013 at 04:57:53PM +0100, Valerio Pachera wrote:
> Hi, I've been repeating a snapshot stress test with dev branch
> (0.7.0_197_g9f718d2).
> 
> Host side
> 
> while true; do
>      dog vdi snapshot wheezy_template
>       sleep 45
> done
> 
> Guest side
> 
> for n in $(seq 1 27)
> do
>     dd if=/dev/zero of=c$n bs=1M count=256
>     sync
>     sleep 5
> done
> 
> It work for some time, after I get in sheep.log
> Dec 17 16:28:09   WARN [gway 3741] wait_forward_request(413) poll timeout
> 1, disks of some nodes or network is busy. Going to poll-wait again
> Dec 17 16:28:09   WARN [gway 3758] wait_forward_request(413) poll timeout
> 1, disks of some nodes or network is busy. Going to poll-wait again
> Dec 17 16:28:09   WARN [gway 3773] wait_forward_request(413) poll timeout
> 1, disks of some nodes or network is busy. Going to poll-wait again
> Dec 17 16:28:09   WARN [gway 3775] wait_forward_request(413) poll timeout
> 1, disks of some nodes or network is busy. Going to poll-wait again
> 

This probably measn the whole cluster is too busy. Is your cache on SSD? All
you have to do is 'wait for io completion'.

> dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies
> Tag
> s wheezy_template     1   10 GB  1.8 GB  0.0 MB 2013-12-17 15:35
> 5ddf88     2
> s wheezy_template     2   10 GB  420 MB  1.5 GB 2013-12-17 15:44
> 5ddf89     2
> s wheezy_template     3   10 GB  528 MB  1.9 GB 2013-12-17 16:26
> 5ddf8a     2
> s wheezy_template     4   10 GB  320 MB  2.4 GB 2013-12-17 16:27
> 5ddf8b     2
> s wheezy_template     5   10 GB  540 MB  2.6 GB 2013-12-17 16:27
> 5ddf8c     2
> s wheezy_template     6   10 GB  924 MB  3.1 GB 2013-12-17 16:27
> 5ddf8d     2
> s wheezy_template     7   10 GB  1.4 GB  4.0 GB 2013-12-17 16:28
> 5ddf8e     2
> s wheezy_template     8   10 GB  1.3 GB  5.4 GB 2013-12-17 16:28
> 5ddf8f     2
> s wheezy_template     9   10 GB  1.5 GB  6.7 GB 2013-12-17 16:29
> 5ddf90     2
> s wheezy_template    10   10 GB  276 MB  8.2 GB 2013-12-17 16:30
> 5ddf91     2
> s wheezy_template    11   10 GB  0.0 MB  8.4 GB 2013-12-17 16:31
> 5ddf92     2
> s wheezy_template    12   10 GB  0.0 MB  8.4 GB 2013-12-17 16:32
> 5ddf93     2
> s wheezy_template    13   10 GB  0.0 MB  8.4 GB 2013-12-17 16:32
> 5ddf94     2
> s wheezy_template    14   10 GB  0.0 MB  8.4 GB 2013-12-17 16:33
> 5ddf95     2
> s wheezy_template    15   10 GB  8.0 MB  8.4 GB 2013-12-17 16:34
> 5ddf96     2
> s wheezy_template    16   10 GB   16 MB  8.4 GB 2013-12-17 16:35
> 5ddf97     2
> s wheezy_template    17   10 GB  0.0 MB  8.4 GB 2013-12-17 16:35
> 5ddf98     2
> s wheezy_template    18   10 GB  0.0 MB  8.4 GB 2013-12-17 16:36
> 5ddf99     2
> s wheezy_template    19   10 GB  0.0 MB  8.4 GB 2013-12-17 16:37
> 5ddf9a     2
> s wheezy_template    20   10 GB  0.0 MB  8.4 GB 2013-12-17 16:38
> 5ddf9b     2
> s wheezy_template    21   10 GB   20 MB  8.4 GB 2013-12-17 16:38
> 5ddf9c     2
> s wheezy_template    22   10 GB  284 MB  8.3 GB 2013-12-17 16:39
> 5ddf9d     2
> s wheezy_template    23   10 GB  0.0 MB  8.6 GB 2013-12-17 16:40
> 5ddf9e     2
> s wheezy_template    24   10 GB  0.0 MB  8.6 GB 2013-12-17 16:41
> 5ddf9f     2
> s wheezy_template    25   10 GB  0.0 MB  8.6 GB 2013-12-17 16:41
> 5ddfa0     2
> 
> Guest side
> task XXXX blocked for more than 120 seconds.
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 

This means some IOs are not returned yet. I think probably you can log in the
VM as a another user to test if the VM is really deady or just some IO tasks
are hung

Thanks
Yuan