[Sheepdog] Configuring simple cluster on CentOS 5.5 x86_64

Fri Oct 15 16:59:15 CEST 2010

Some details:

TGT is build from the latest on GIT:
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/tomo/tgt.git
# git checkout -b test origin/sheepdog

sheep uses 100% CPU, while creating new tgt logicalunit is still running: 
PID		USER	PR	NI	VIRT	RES	SHR	S	%CPU	%MEM	TIME+		COMMAND
2082	root		25	0	707m	3504	2696	R	100.1	0.4		38:43.09	sheep
---
Yura

On Oct 15, 2010, at 5:33 PM, Yuriy Kohut wrote:

> One more new issue with TGTd.
> 
> Initially we have one sheepdog vdi (on which we would like to create iscsi unit) and no tgt targets/units:
> [root at centos ~]# tgtadm --op show --mode target
> [root at centos ~]# collie vdi list
>  name        id    size    used  shared    creation time   vdi id
> ------------------------------------------------------------------
>  test0        1  4.0 GB  4.0 GB  0.0 MB 2010-10-15 17:55   fd34af
> [root at centos ~]#  
> 
> 
> Creating new target:
> [root at centos ~]# tgtadm --op new --mode target --tid 1 -T some.vps:disk0
> [root at centos ~]# tgtadm --op show --mode target
> Target 1: some.vps:disk0
>    System information:
>        Driver: iscsi
>        State: ready
>    I_T nexus information:
>    LUN information:
>        LUN: 0
>            Type: controller
>            SCSI ID: IET     00010000
>            SCSI SN: beaf10
>            Size: 0 MB
>            Online: Yes
>            Removable media: No
>            Readonly: No
>            Backing store type: null
>            Backing store path: None
>            Backing store flags: 
>    Account information:
>    ACL information:
> [root at centos ~]#
> 
> 
> Try to create new logicalunit on existing tgt target and sheepdog vdi:
> [root at centos ~]# tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b test0 --bstype sheepdog
> 
> 
> But the process never ends.
> Please advise ...
> ---
> Yura
> 
> On Oct 15, 2010, at 4:55 PM, Yuriy Kohut wrote:
> 
>> Cool, that works.
>> 
>> Thanks
>> ---
>> Yura
>> 
>> On Oct 15, 2010, at 3:52 PM, MORITA Kazutaka wrote:
>> 
>>> At Fri, 15 Oct 2010 13:38:16 +0300,
>>> Yuriy Kohut wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I'm using the following 'Getting Started' guide to configure simple cluster:
>>>> http://www.osrg.net/sheepdog/usage.html
>>>> 
>>>> I have configured cluster on 1 node/box, so the first questions are:
>>>> Can I configure cluster on single node (1 box) under CentOS 5.5 x86_64 ? 
>>>> Is it required at least 3 nodes/boxes ... ?
>>>> 
>>>> I have faced with the following issue on my single-node cluster. I have rebooted the box after my first image creation. The following done for that:
>>>> - corosync is up and running
>>>> udp        0      0 192.168.128.195:5404        0.0.0.0:*                               3541/corosync       
>>>> udp        0      0 192.168.128.195:5405        0.0.0.0:*                               3541/corosync       
>>>> udp        0      0 226.94.1.1:5405             0.0.0.0:*                               3541/corosync 
>>>> 
>>>> - sheep is up and running
>>>> tcp        0      0 0.0.0.0:7000                0.0.0.0:*                   LISTEN      3561/sheep
>>>> 
>>>> - cluster is formatted with 1 copy only
>>>> #collie cluster format --copies=1
>>>> 
>>>> - the image with prelocated data is created
>>>> # qemu-img create sheepdog:test0 -o preallocation=data 4G
>>>> 
>>>> 
>>>> So after such siple steps I got:
>>>> # collie vdi list
>>>> name        id    size    used  shared    creation time   vdi id
>>>> ------------------------------------------------------------------
>>>> test0        1  4.0 GB  4.0 GB  0.0 MB 2010-10-15 12:42   fd34af
>>>> 
>>>> 
>>>> Then I rebooted the box, and no image(s) are available for me after box came back. The vdi list just show nothing:
>>>> # collie vdi list
>>>> name        id    size    used  shared    creation time   vdi id
>>>> ------------------------------------------------------------------
>>>> 
>>>> and 'collie vdi list' never ends ...
>>>> corosync and sheep are still running.
>>>> 
>>>> 
>>>> Could somebody assist me with that.
>>> 
>>> Sorry, the following patch will fixes the problem.
>>> 
>>> =
>>> From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>> Subject: [PATCH] sheep: call start_recovery when cluster restarts with one node
>>> 
>>> Sheepdog recovers objects before starting a storage service, and the
>>> routine is called when nodes are joined.  However If sheepdog consists
>>> of only one node, no node doesn't send join messages, so
>>> start_recovery doesn't called.  This patch fixes the problem.
>>> 
>>> Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>> ---
>>> sheep/group.c |    3 +++
>>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/sheep/group.c b/sheep/group.c
>>> index ba8cdfb..86cbdb8 100644
>>> --- a/sheep/group.c
>>> +++ b/sheep/group.c
>>> @@ -1226,6 +1226,9 @@ static void __sd_confchg_done(struct cpg_event *cevent)
>>> 
>>> 		update_cluster_info(&msg);
>>> 
>>> +		if (sys->status == SD_STATUS_OK) /* sheepdog starts with one node */
>>> +			start_recovery(sys->epoch, NULL, 0);
>>> +
>>> 		return;
>>> 	}
>>> 
>>> -- 
>>> 1.5.6.5
>>> 
>> 
>> -- 
>> sheepdog mailing list
>> sheepdog at lists.wpkg.org
>> http://lists.wpkg.org/mailman/listinfo/sheepdog
> 
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog