[sheepdog-users] err_to_sderr(xxx) - Too many open files & ERROR [gway xxx] wait_forward_request(437) fail 56c6d200000342, Network error between sheep

Wed Aug 10 12:47:30 CEST 2016

Hi,

I have a relatively small cluster.
Debian Jessie. Sheepdog 0.8.4 with corosync (as part of Distribution)
4 Nodes with Storege
2 Nodes in Gateway Mode

About 12 Active  QEMU VMs (Windows & Linux)

Now after a while the following errors start to show up in the logs:
On Nodes with Storage and VMs:
Aug 10 12:13:00  ERROR [io 32282] err_to_sderr(110) Too many open files, oid=a3923c000010ba

On Gateway Nodes:
ERROR [gway 3986] wait_forward_request(437) fail 56c6d200000342, Network error between sheep

It seams to be related to the amount of load on all machines. These errors by itself don't seam to have a noticeable impact on the work and stability of the cluster.
But after a while the amount of errors is starting to increase until 100s per Second, one node is blocking and the hole cluster is on hold. After stopping the one node which blocks everything, the cluster continues to work. But all VMs on that node needed to be destroyed (Turned off by Force) .

The last time this happened, we also had a data loss (Log from other node):
ALERT [rw] fetch_object_list(933) some objects may be not recovered at epoch 86

I played around with ulimit -SHn 1048576, but this didn't have any effect.
lsof | wc -l  returns values between 20000 and 150000.

Question: Is this a known error solved in newer version ? Do I have any options to solve that problem by parameters.
I would like to continue the usage of sheepdog, but the freeze and datalose are showstoppers.

Regards,
Oliver Günther

*---*
Oliver Günther
CS Computer & Service GmbH
Banksstrasse 4
20097 Hamburg
fon: +49 40 8818070
fax: +49 40 88180717
web: www.cs-gmbh.net
mail: info at cs-gmbh.net

Geschäftsführer: Oliver Günther
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20160810/7717c22a/attachment.html>