[Sheepdog] A question about sheepdog's beahvior ...

Wed Oct 27 08:16:26 CEST 2010

At Tue, 26 Oct 2010 12:39:06 +0200,
Davide Casale wrote:
> 
> Hi to all,
> I've installed Sheepdog Daemon, version 0.1.0 (with corosync 1.2.0 svn 
> rev. 2637) on ubuntu 10.04LTS..
> The corosync.conf file is (for the useful part) :
> ---
> compatibility: whitetank
> totem {
>          version: 2
>          secauth: off
>          threads: 0
>          token: 3000
>          consensus: 5000
>          interface {
>                  ringnumber: 0
>                  bindnetaddr: 192.168.7.x
>                  mcastaddr: 226.94.1.1
>                  mcastport: 5405
>          }
> }
> ---
> I've installed all on three machines with default redundancy (that's 3, 
> it's correct? I launch sheepdog with default /etc/init.d/sheepdog start)..

Yes, it's default redundancy.

> I've got 20GB of kvm virtual machines ..
> 
> The questions are :
> 
> - is it correct that if a single node crash (or I stop with "killall 
> sheep" the sheepdog processes) when I relaunch sheepdog ALL the data
> are rebuilt from scratch from the other two nodes (each time it restarts 
> from zero bytes to arrive to 20GB) ?
> I thought that only the changed blocks (4mb each) are resyncronized .... ??

Yes, it's correct behavior.  Sheepdog cannot detect which objects are
updated from the previous node membership change, so it is safe to
receive all objects from the already joined nodes.  However, as you
say, it's worth considering to optimize it.

> 
> - is it correct that when the syncronization is running on a node, all 
> the others are frozen (and also the kvm virtual machines are frozen)
> until the syncronization is completed ?

Yes.  Currently, if a virtual machine accesses to the object which is
not placed on the right nodes (it could happen because of node
membership changes), sheepdog stops the access until the object is
moved to the right node.  But this behavior should be fixed as soon as
possible, I think.

> 
> And perhaps this is a little bug:
> if during the syncronization I launch on the node in syncronization the 
> command 'collie node info', the command remain in standby after
> the first output.. if I stop it with CTRL+C, when the syncronization 
> ended one of the sheep process crash and if I relaunch sheepdog the
> sycnronization starts again from the beginning (from zero bytes) ...
> 

The reason 'collie node info' sleeps is same with above.  The problem
that sheep crashes would be fixed by the following patch.  Thanks for
your feedback.


=
From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
Subject: [PATCH] sheep: call free_request() after decrementing reference counters

We cannot call free_req() here because client_decref() accesses
req->ci.

Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
---
 sheep/sdnet.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/sheep/sdnet.c b/sheep/sdnet.c
index 9ad0bc7..6d7e7a3 100644
--- a/sheep/sdnet.c
+++ b/sheep/sdnet.c
@@ -271,12 +271,17 @@ static void free_request(struct request *req)
 
 static void req_done(struct request *req)
 {
+	int dead = 0;
+
 	list_add(&req->r_wlist, &req->ci->done_reqs);
 	if (conn_tx_on(&req->ci->conn)) {
 		dprintf("connection seems to be dead\n");
-		free_request(req);
+		dead = 1;
 	}
 	client_decref(req->ci);
+
+	if (dead)
+		free_request(req);
 }
 
 static void init_rx_hdr(struct client_info *ci)
-- 
1.5.6.5