[sheepdog] Fwd: [PATCH 0/4] bugfix for erasure coding recovery

Hitoshi Mitake mitake.hitoshi at lab.ntt.co.jp
Wed Oct 22 03:32:08 CEST 2014


At Tue, 21 Oct 2014 12:18:55 +0200,
Valerio Pachera wrote:
> 
> 2014-10-20 9:07 GMT+02:00 Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>:
> > This patchset removes a bug in recovery process. Current recovery
> > process can lose data of erasure coded VDIs when a number of nodes is
> > smaller than a number of data stripes.
> >
> > The same thing can be found here:
> > https://github.com/sheepdog/sheepdog/tree/ec-recovery
> >
> > Valerio, could you test it?
> 
> It seems to work fine.
> 
> I used -c 2:1 and kill all nodes but one.
> I rejoined the cluster with the second node and check the md5sum of
> the vdi and it matches the one calculated before killing the nodes.
> 
> dog vdi read test | md5sum
> 8886bddd205a7698a8194594c76e61b5  -
> 
> dog vdi read test | md5sum
> 8886bddd205a7698a8194594c76e61b5  -

Thanks for your testing, Valerio.

> 
> I notice that a lot of INFO and ERROR get printed in sheep.log.
> In my testing environment I have only 1 vdi of 800M.
> In a real cluster with terabytes of data the log would probably became huge.

The below error messages seem to be introduced by trivial
mistake. I'll fix it before applying.

Thanks,
Hitoshi

> 
> ...
> Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
> progress  47%
> Oct 21 12:09:18  ERROR [rw 14158] sheep_exec_req(1170) failed Failed
> to find requested tag, remote address: 192.168.10.5:7000, op name:
> GET_EPOCH
> Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0
> Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(118) clients may
> see old data
> Oct 21 12:09:18  ERROR [rw 14158] read_erasure_object(230) can not
> read 7c2b2500000085 idx 0
> Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
> progress  48%
> Oct 21 12:09:18  ERROR [rw 13514] sheep_exec_req(1170) failed Failed
> to find requested tag, remote address: 192.168.10.5:7000, op name:
> GET_EPOCH
> Oct 21 12:09:18  ALERT [rw 13514] rollback_vnode_info(117) cannot get epoch 0
> Oct 21 12:09:18  ALERT [rw 13514] rollback_vnode_info(118) clients may
> see old data
> Oct 21 12:09:18  ERROR [rw 13514] read_erasure_object(230) can not
> read 7c2b2500000086 idx 0
> Oct 21 12:09:18  ERROR [rw 14158] sheep_exec_req(1170) failed Failed
> to find requested tag, remote address: 192.168.10.5:7000, op name:
> GET_EPOCH
> Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(117) cannot get epoch 0
> Oct 21 12:09:18  ALERT [rw 14158] rollback_vnode_info(118) clients may
> see old data
> Oct 21 12:09:18  ERROR [rw 14158] read_erasure_object(230) can not
> read 7c2b2500000088 idx 1
> Oct 21 12:09:18   INFO [main] recover_object_main(908) object recovery
> progress  50%
> ...
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list