[sheepdog] Fwd: [PATCH 0/4] bugfix for erasure coding recovery

Valerio Pachera sirio81 at gmail.com
Tue Oct 21 12:18:55 CEST 2014


2014-10-20 9:07 GMT+02:00 Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>:
> This patchset removes a bug in recovery process. Current recovery
> process can lose data of erasure coded VDIs when a number of nodes is
> smaller than a number of data stripes.
>
> The same thing can be found here:
> https://github.com/sheepdog/sheepdog/tree/ec-recovery
>
> Valerio, could you test it?

It seems to work fine.

I used -c 2:1 and kill all nodes but one.
I rejoined the cluster with the second node and check the md5sum of
the vdi and it matches the one calculated before killing the nodes.

dog vdi read test | md5sum
8886bddd205a7698a8194594c76e61b5  -

dog vdi read test | md5sum
8886bddd205a7698a8194594c76e61b5  -

I notice that a lot of INFO and ERROR get printed in sheep.log.
In my testing environment I have only 1 vdi of 800M.
In a real cluster with terabytes of data the log would probably became huge.

...
Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
progress  47%
Oct 21 12:09:18  ERROR [rw 14158] sheep_exec_req(1170) failed Failed
to find requested tag, remote address: 192.168.10.5:7000, op name:
GET_EPOCH
Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0
Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(118) clients may
see old data
Oct 21 12:09:18  ERROR [rw 14158] read_erasure_object(230) can not
read 7c2b2500000085 idx 0
Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
progress  48%
Oct 21 12:09:18  ERROR [rw 13514] sheep_exec_req(1170) failed Failed
to find requested tag, remote address: 192.168.10.5:7000, op name:
GET_EPOCH
Oct 21 12:09:18  ALERT [rw 13514] rollback_vnode_info(117) cannot get epoch 0
Oct 21 12:09:18  ALERT [rw 13514] rollback_vnode_info(118) clients may
see old data
Oct 21 12:09:18  ERROR [rw 13514] read_erasure_object(230) can not
read 7c2b2500000086 idx 0
Oct 21 12:09:18  ERROR [rw 14158] sheep_exec_req(1170) failed Failed
to find requested tag, remote address: 192.168.10.5:7000, op name:
GET_EPOCH
Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0
Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(118) clients may
see old data
Oct 21 12:09:18  ERROR [rw 14158] read_erasure_object(230) can not
read 7c2b2500000088 idx 1
Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
progress  50%
...



More information about the sheepdog mailing list