[sheepdog-users] Stability regression with erasure coding

Valerio Pachera sirio81 at gmail.com
Fri Jul 4 16:54:35 CEST 2014


Hi, I was testing master branch on a 4 nodes cluster.

I got severe issues formatting the cluster with -c 2:1.

I imported a vdi and run a the guest.
qemu crashed after boot.
Sheep.log was showing:

Jul 04 16:46:25  ERROR [main] check_request_epoch(176) new node version 1,
9 (READ_PEER)
Jul 04 16:46:26  ERROR [rw 19500] do_read(236) failed to read from socket:
-1, Resource temporarily unavailable
Jul 04 16:46:26  ERROR [rw 19500] exec_req(347) failed to read a response

I restarted the guest and it was working.
I tried then to kill a node (obviously not the one the guest was running
on).
After that, I wasn't even able to login inside the guest.

Sheep.log is showing

Jul 04 16:46:55  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:55  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:55  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:55  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ERROR [rw 19500] do_read(236) failed to read from socket:
-1, Resource temporarily unavailable
Jul 04 16:46:56  ERROR [rw 19500] exec_req(347) failed to read a response
Jul 04 16:46:56  ERROR [rw 19500] recover_replication_object(412) can not
recover oid 87c2b260000008e
Jul 04 16:46:56  ERROR [rw 19500] recover_object_work(576) failed to
recover object 87c2b260000008e
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_number(100) copy number for
fd3815 not found, set 3
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_number(100) copy number for
fd3815 not found, set 3
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:46:56  ERROR [main] check_request_epoch(176) new node version 1,
9 (READ_PEER)
Jul 04 16:47:25  ERROR [main] check_request_epoch(176) new node version 1,
9 (READ_PEER)
Jul 04 16:47:25  ERROR [main] check_request_epoch(176) new node version 1,
9 (READ_PEER)
Jul 04 16:47:25  ERROR [rw 19447] do_read(236) failed to read from socket:
-1, Resource temporarily unavailable
Jul 04 16:47:25  ERROR [rw 19447] exec_req(347) failed to read a response
Jul 04 16:47:25  ERROR [rw 19447] read_erasure_object(228) can not read
fd3815000007f3 idx 0
Jul 04 16:47:25  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:25  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:25  ALERT [rw 19447] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ERROR [rw 19498] do_read(236) failed to read from socket:
-1, Resource temporarily unavailable
Jul 04 16:47:26  ERROR [rw 19498] exec_req(347) failed to read a response
Jul 04 16:47:26  ERROR [rw 19498] read_erasure_object(228) can not read
fd381500000502 idx 2
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:26  ALERT [rw 19498] get_vdi_copy_policy(117) copy policy for
fd3815 not found, set 17
Jul 04 16:47:50  ERROR [gway 23895] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23896] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23890] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23889] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23886] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23867] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23892] wait_forward_request(438) fail
7c2b2500000208, Node is killed
Jul 04 16:47:50  ERROR [gway 23888] wait_forward_request(438) fail
7c2b2500000208, Node is killed

there are endless number ot these last error messages.

Sheepdog daemon version 0.8.0_223_ge4735ba.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140704/d941e935/attachment-0004.html>


More information about the sheepdog-users mailing list