[sheepdog-users] Automatic disconnection of client during recovery

Valerio Pachera sirio81 at gmail.com
Fri Dec 13 08:28:49 CET 2013


2013/12/12 Liu Yuan <namei.unix at gmail.com>

> I think it is risky to run sheepdog with heterogeneous NIC setup since
> when I
> wrote the dual NIC support I didn't take account in this case, though I
> guess
> it is not hard to support it. But for now, we don't look at the issue
> (yet).
>

My target was to check if adding a node with a single nic crashes the whole
cluster.
It happened one time (on a older sheep version) and I wonder if it happens
and is repeatable on the latest version.
I was then going to open the bug on lauchpad ad suggested my Hitoshi Mitake
reporting messages in sheep.log (that's why I cleaned my sheep.log).


>
> Which node see itself as single node and how about views on others? It
> seems
> this node was split from others.
>

It's different.
I saw other times the split brain situation where all the nodes were in the
cluster but one was aware of it self only.
This time test004 test005 and test007 were all printing "test007" as the
only member of the cluster.
That's why I consider this bizarre.
It's the first time I see something like that.

Yesterday, before quiting work, I inserted back the other nodes (except
test006) and let them recover.
This morning I see this:

2013-12-12 17:12:29     38 [192.168.2.44:7000, 192.168.2.45:7000]
2013-12-12 17:12:27     37 [192.168.2.44:7000, 192.168.2.45:7000,
192.168.2.47:7000]
2013-12-12 17:12:29     36 [192.168.2.45:7000, 192.168.2.47:7000]
2013-12-12 16:15:07     35 [192.168.2.47:7000] (<-when I wrote the mail
yesterday)

root at test004:~# grep -v recover_object_main /var/sheep/sheep.log  | \
grep -v 'No object found' | grep -v 'No such file or directory'

Dec 12 17:11:11   INFO [main] main(888) shutdown
Dec 12 17:12:24   INFO [main] md_add_disk(310) /mnt/sheep/dsk02, vdisk nr
233, total disk 1
Dec 12 17:12:24   INFO [main] send_join_request(777) IPv4 ip:192.168.2.44
port:7000
Dec 12 17:12:26   INFO [main] check_host_env(477) Allowed open files
1024000, suggested 6144000
Dec 12 17:12:26   INFO [main] main(881) sheepdog daemon (version
0.7.0_197_g9f718d2) started
Dec 12 17:12:29  ERROR [rw] sheep_exec_req(933) failed Network error
between sheep
Dec 12 17:12:29  ERROR [rw] sheep_exec_req(933) failed Network error
between sheep
Dec 12 17:12:29  ERROR [rw] sheep_exec_req(933) failed Network error
between sheep
Dec 12 17:12:29  ALERT [rw] recover_replication_object(371) cannot access
any replicas of 80d398ec00000000 at epoch 35
Dec 12 17:12:29  ALERT [rw] recover_replication_object(372) clients may see
old data
Dec 12 17:12:29  ERROR [rw] sheep_exec_req(933) failed Network error
between sheep
Dec 12 17:12:29  ERROR [rw] sheep_exec_req(933) failed Request has an old
epoch
Dec 12 17:12:29  ERROR [rw] recover_object_work(531) failed to recover
object 80d398ec00000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20131213/ea4431c2/attachment-0005.html>


More information about the sheepdog-users mailing list