[sheepdog] [PATCH 1/2] test: add a test for sockfd keepalive

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Sep 3 13:15:07 CEST 2012


At Mon, 27 Aug 2012 17:32:33 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly at taobao.com>
> 
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
>  tests/035     |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/035.out |   42 ++++++++++++++++++++++++++++++++++++++++++
>  tests/group   |    1 +
>  3 files changed, 97 insertions(+)
>  create mode 100755 tests/035
>  create mode 100644 tests/035.out
> 
> diff --git a/tests/035 b/tests/035
> new file mode 100755
> index 0000000..501f959
> --- /dev/null
> +++ b/tests/035
> @@ -0,0 +1,54 @@
> +#!/bin/bash
> +
> +# Test sockfd keepalive
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1        # failure is the default!
> +
> +trap "_uninit; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +_uninit()
> +{
> +	iptables -D INPUT -p tcp --sport 7001 -j DROP
> +	iptables -D INPUT -p tcp --dport 7001 -j DROP
> +}
> +
> +_cleanup
> +
> +for i in `seq 0 1 2`; do
> +    _start_sheep $i
> +done
> +
> +_wait_for_sheep 3
> +
> +$COLLIE cluster format -c 3 -m unsafe
> +
> +$COLLIE vdi create test 40M
> +(
> +dd if=/dev/urandom | $COLLIE vdi write test
> +) &
> +
> +sleep 3
> +# Simulate machine(127.0.0.1:7001) down
> +iptables -A INPUT -p tcp --sport 7001 -j DROP
> +iptables -A INPUT -p tcp --dport 7001 -j DROP
> +
> +sleep 1
> +# Trigger the confchg
> +_kill_sheep 1
> +
> +_wait_for_collie
> +
> +for i in `seq 0 9`; do
> +	$COLLIE vdi object -i $i test
> +done
> +
> +status=0
> diff --git a/tests/035.out b/tests/035.out
> new file mode 100644
> index 0000000..0c55d7e
> --- /dev/null
> +++ b/tests/035.out
> @@ -0,0 +1,42 @@
> +QA output created by 035
> +using backend farm store
> +Looking for the object 0x7c2b2500000000 (the inode vid 0x7c2b25 idx 0) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000001 (the inode vid 0x7c2b25 idx 1) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000002 (the inode vid 0x7c2b25 idx 2) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000003 (the inode vid 0x7c2b25 idx 3) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000004 (the inode vid 0x7c2b25 idx 4) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000005 (the inode vid 0x7c2b25 idx 5) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000006 (the inode vid 0x7c2b25 idx 6) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000007 (the inode vid 0x7c2b25 idx 7) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000008 (the inode vid 0x7c2b25 idx 8) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> +Looking for the object 0x7c2b2500000009 (the inode vid 0x7c2b25 idx 9) with 2 nodes
> +
> +127.0.0.1:7000 has the object (should be 2 copies)
> +127.0.0.1:7002 has the object (should be 2 copies)
> diff --git a/tests/group b/tests/group
> index d20de40..1dafad4 100644
> --- a/tests/group
> +++ b/tests/group
> @@ -46,3 +46,4 @@
>  032 auto quick store
>  033 auto quick store
>  034 auto quick store
> +035 auto quick cluster

I found that this script takes a lot of time (about 15 minutes)
occasionally.  Perhaps, TCP keepalive is not working in some
situations?  This problem is highly reproducible on my environment
with the following script.

 $ while test "$?" -eq 0; do ./check 35 -corosync; done

I wonder if we should dig into this problem.  Can we close all
connections when epoch is incremented?  I think you tried it but gave
up before.

  http://www.mail-archive.com/sheepdog@lists.wpkg.org/msg04524.html

Is it still difficult to implement the approach with the current
Sheepdog?

Thanks,

Kazutaka



More information about the sheepdog mailing list