[Sheepdog] connection fail and too many open files

Keiichi SHIMA shima at wide.ad.jp
Fri Sep 2 13:55:37 CEST 2011


Morita-san,

Thank you for your quick response!

I cloned the latest devel branch and tested in my environment same as before I reported.

This time, it seems all the operations are going fine without any problem so far.  (joining nodes works fine, and there is no more 'too many open files' errors)

At this moment, I just have only one iscsi target and lun in this environment, but will have more complicated configuration and do some further tests.
If I find any other issues, I will report them in this list again.

Thank you very much.

---
Keiichi SHIMA  <shima at wide.ad.jp>
WIDE Project http://www.wide.ad.jp/



On Sep 2, 2011, at 1:23 AM, MORITA Kazutaka wrote:

> At Wed, 31 Aug 2011 14:56:35 +0900,
> Keiichi SHIMA wrote:
>> 
>> Hello,
>> 
>> I'm trying to use sheepdog as an iscsi backing store, and facing some issues.
>> 
>> I'm using 46 PCs as sheepdog storage nodes.  Making a cluster with them went fine (as long as I don't change membership), and I could create a disk image in the cluster.  I setup iscsi target on one of the sheepdog storage nodes, and setup another PC as an iscsi initiator.  I could mount the sheepdog disk over iscsi protocol.  I checked if I could make a filesystem on the mounted iscsi volume.  It went all fine.
>> 
>> But once I unmounted the volume (causing syncing on the disk), the sheepdog cluster started complaining.  In the log file of the storage node, which is also the iscsi target node, started showing the following error messages.
>> 
>>  Aug 31 02:22:06 forward_write_obj_req(396) failed to connect to 2001:200:d00:101::43:7000
>>  Aug 31 02:22:06 store_queue_request(854) failed, 42, 3, 62ee040000001a , 1, 129
>> 
>> In the above case, the failed node was 2001:200:d00:101::43, but there were many same errors for different nodes.
>> 
>> I tried to perform collie on the node, but collie didn't respond.  From this point, the sheep started generating the following error messages.
>> 
>>  Aug 31 02:25:56 listen_handler(567) can't accept a new connection, Too many open files
>> 
>> 
>> I uploaded sheep.log files of all the sheepdog storage nodes I was using during the above operation at
>> 
>>  http://member.wide.ad.jp/~shima/tmp/sheeplog-201108311426.tgz
>> 
>> 
>> The following is the procedure I did to check the above behavior.
>> 
>> 1. setup a sheepdog cluster with 46 nodes with 3 copies
>> 
>>  collie cluster format --copies=3
>> 
>> 2. created a disk image
>> 
>>  qemu-img create sheepdog:disk00 -o preallocation=data 1G
>> 
>> 3. started iscsi target (tgtd) on 2001:200:d00:101::92 (corresponds to 172.16.22.92 in the uploaded log file)
>> 
>>  tgtd
>>  tgtadm --op new --mode target --tid 1 --lld iscsi -T iqn.2011-09.jp.ad.wide.cloud.sheepdog.storage.1
>>  tgtadm --op new --mode logicalunit --tid 1 --lun 1 -b disk00 --bstype sheepdog
>>  tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
>> 
>> 4. mount the volume with other PC which is not a part of the sheepdog cluster
>> 
>> 5. making filesystem, read/write operation on the mounted volume
>> 
>> 6. unmount the volume
>>  (sheep start generating error messages as shown above)
>> 
>> 7. perform collie operation on 2001:200:d00:101::92 (corresponds to 172.16.22.92 in the uploaded log file)
>>  (sheep start generating another error messages as shown above)
>> 
>> 
>> Is there any suggestions?
> 
> Thanks for the information.  I sent a patch to fix a "too many open
> files" problem just now:
> 
>  http://lists.wpkg.org/pipermail/sheepdog/2011-September/001324.html
> 
> I think this would solve this problem.
> 
> 
> Thanks,
> 
> Kazutaka
> 




More information about the sheepdog mailing list