[Sheepdog] Configuring simple cluster on CentOS 5.5 x86_64

Tue Oct 19 17:40:47 CEST 2010

Got it working.

The next step is to try all that on real hardware 3 node cluster.

Thank you for help.
---
Yura

On Oct 19, 2010, at 3:02 PM, MORITA Kazutaka wrote:

> Hi,
> 
> Your sheep.log says
> 
>  Oct 19 05:59:06 send_message(169) failed to send message, 2
> 
> This means that the sheep daemon failed to communicate with corosync.
> Unfortunatelly, I've never seen such an error...
> 
> Try following things:
> - restart corosync daemon
> - disable iptable and restart corosync
> - disable selinux and restart corosync
> 
> Did sheepdog work fine when you tested it on debian?
> 
> Thanks,
> 
> Kazutaka
> 
> On 2010/10/19 19:56, Yuriy Kohut wrote:
>> Attached.
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> 
>> 
>> Please feel free to kick me if anything else required.
>> 
>> ---
>> Yura
>> 
>> On Oct 19, 2010, at 1:45 PM, MORITA Kazutaka wrote:
>> 
>>> Could you send me a sheep.log in the store directory?
>>> It would be helpful for debugging.
>>> 
>>> Kazutaka
>>> 
>>> On 2010/10/19 19:16, Yuriy Kohut wrote:
>>>> The patch doesn't help.
>>>> 
>>>> Probably I'm doing something wrong. but the following operation won't finish:
>>>> # tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b test0 --bstype sheepdog
>>>> 
>>>> 
>>>> Attached please find the the operation/command strace log archived: 
>>>> strace.log.tar.gz
>>>> 
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------
>>>> 
>>>> 
>>>> 
>>>> Please advise.
>>>> 
>>>> Thank you
>>>> ---
>>>> Yura
>>>> 
>>>> On Oct 19, 2010, at 11:52 AM, Yuriy Kohut wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Sure. I'll let you know results.
>>>>> 
>>>>> Thank you.
>>>>> ---
>>>>> Yura
>>>>> 
>>>>> On Oct 19, 2010, at 11:46 AM, MORITA Kazutaka wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> At Fri, 15 Oct 2010 17:33:18 +0300,
>>>>>> Yuriy Kohut wrote:
>>>>>>> One more new issue with TGTd.
>>>>>>> 
>>>>>>> Initially we have one sheepdog vdi (on which we would like to create iscsi unit) and no tgt targets/units:
>>>>>>> [root at centos ~]# tgtadm --op show --mode target
>>>>>>> [root at centos ~]# collie vdi list
>>>>>>> name        id    size    used  shared    creation time   vdi id
>>>>>>> ------------------------------------------------------------------
>>>>>>> test0        1  4.0 GB  4.0 GB  0.0 MB 2010-10-15 17:55   fd34af
>>>>>>> [root at centos ~]#  
>>>>>>> 
>>>>>>> 
>>>>>>> Creating new target:
>>>>>>> [root at centos ~]# tgtadm --op new --mode target --tid 1 -T some.vps:disk0
>>>>>>> [root at centos ~]# tgtadm --op show --mode target
>>>>>>> Target 1: some.vps:disk0
>>>>>>> System information:
>>>>>>>     Driver: iscsi
>>>>>>>     State: ready
>>>>>>> I_T nexus information:
>>>>>>> LUN information:
>>>>>>>     LUN: 0
>>>>>>>         Type: controller
>>>>>>>         SCSI ID: IET     00010000
>>>>>>>         SCSI SN: beaf10
>>>>>>>         Size: 0 MB
>>>>>>>         Online: Yes
>>>>>>>         Removable media: No
>>>>>>>         Readonly: No
>>>>>>>         Backing store type: null
>>>>>>>         Backing store path: None
>>>>>>>         Backing store flags: 
>>>>>>> Account information:
>>>>>>> ACL information:
>>>>>>> [root at centos ~]#
>>>>>>> 
>>>>>>> 
>>>>>>> Try to create new logicalunit on existing tgt target and sheepdog vdi:
>>>>>>> [root at centos ~]# tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b test0 --bstype sheepdog
>>>>>>> 
>>>>>>> 
>>>>>>> But the process never ends.
>>>>>>> Please advise ...
>>>>>> Thanks for your report.
>>>>>> 
>>>>>> Can you try the following patch I sent minutes ago?
>>>>>> http://lists.wpkg.org/pipermail/sheepdog/2010-October/000741.html
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Kazutaka
>>>>>> 
>>>>>>> ---
>>>>>>> Yura
>>>>>>> 
>>>>>>> On Oct 15, 2010, at 4:55 PM, Yuriy Kohut wrote:
>>>>>>> 
>>>>>>>> Cool, that works.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> ---
>>>>>>>> Yura
>>>>>>>> 
>>>>>>>> On Oct 15, 2010, at 3:52 PM, MORITA Kazutaka wrote:
>>>>>>>> 
>>>>>>>>> At Fri, 15 Oct 2010 13:38:16 +0300,
>>>>>>>>> Yuriy Kohut wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I'm using the following 'Getting Started' guide to configure simple cluster:
>>>>>>>>>> http://www.osrg.net/sheepdog/usage.html
>>>>>>>>>> 
>>>>>>>>>> I have configured cluster on 1 node/box, so the first questions are:
>>>>>>>>>> Can I configure cluster on single node (1 box) under CentOS 5.5 x86_64 ? 
>>>>>>>>>> Is it required at least 3 nodes/boxes ... ?
>>>>>>>>>> 
>>>>>>>>>> I have faced with the following issue on my single-node cluster. I have rebooted the box after my first image creation. The following done for that:
>>>>>>>>>> - corosync is up and running
>>>>>>>>>> udp        0      0 192.168.128.195:5404        0.0.0.0:*                               3541/corosync       
>>>>>>>>>> udp        0      0 192.168.128.195:5405        0.0.0.0:*                               3541/corosync       
>>>>>>>>>> udp        0      0 226.94.1.1:5405             0.0.0.0:*                               3541/corosync 
>>>>>>>>>> 
>>>>>>>>>> - sheep is up and running
>>>>>>>>>> tcp        0      0 0.0.0.0:7000                0.0.0.0:*                   LISTEN      3561/sheep
>>>>>>>>>> 
>>>>>>>>>> - cluster is formatted with 1 copy only
>>>>>>>>>> #collie cluster format --copies=1
>>>>>>>>>> 
>>>>>>>>>> - the image with prelocated data is created
>>>>>>>>>> # qemu-img create sheepdog:test0 -o preallocation=data 4G
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So after such siple steps I got:
>>>>>>>>>> # collie vdi list
>>>>>>>>>> name        id    size    used  shared    creation time   vdi id
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> test0        1  4.0 GB  4.0 GB  0.0 MB 2010-10-15 12:42   fd34af
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Then I rebooted the box, and no image(s) are available for me after box came back. The vdi list just show nothing:
>>>>>>>>>> # collie vdi list
>>>>>>>>>> name        id    size    used  shared    creation time   vdi id
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> and 'collie vdi list' never ends ...
>>>>>>>>>> corosync and sheep are still running.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Could somebody assist me with that.
>>>>>>>>> Sorry, the following patch will fixes the problem.
>>>>>>>>> 
>>>>>>>>> =
>>>>>>>>> From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>>>>>>>> Subject: [PATCH] sheep: call start_recovery when cluster restarts with one node
>>>>>>>>> 
>>>>>>>>> Sheepdog recovers objects before starting a storage service, and the
>>>>>>>>> routine is called when nodes are joined.  However If sheepdog consists
>>>>>>>>> of only one node, no node doesn't send join messages, so
>>>>>>>>> start_recovery doesn't called.  This patch fixes the problem.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>>>>>>>> ---
>>>>>>>>> sheep/group.c |    3 +++
>>>>>>>>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>>>>>>> index ba8cdfb..86cbdb8 100644
>>>>>>>>> --- a/sheep/group.c
>>>>>>>>> +++ b/sheep/group.c
>>>>>>>>> @@ -1226,6 +1226,9 @@ static void __sd_confchg_done(struct cpg_event *cevent)
>>>>>>>>> 
>>>>>>>>> 		update_cluster_info(&msg);
>>>>>>>>> 
>>>>>>>>> +		if (sys->status == SD_STATUS_OK) /* sheepdog starts with one node */
>>>>>>>>> +			start_recovery(sys->epoch, NULL, 0);
>>>>>>>>> +
>>>>>>>>> 		return;
>>>>>>>>> 	}
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> 1.5.6.5
>>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> sheepdog mailing list
>>>>>>>> sheepdog at lists.wpkg.org
>>>>>>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>>>>>> -- 
>>>>>>> sheepdog mailing list
>>>>>>> sheepdog at lists.wpkg.org
>>>>>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>>>> -- 
>>>>> sheepdog mailing list
>>>>> sheepdog at lists.wpkg.org
>>>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>