[sheepdog] [PATCH v2] sheep/recovery: reduce network overhead of preparing object list

Fri Jul 11 04:22:06 CEST 2014

On Wed, Jul 09, 2014 at 05:00:11PM +0800, Ruoyu wrote:
> Hi Yuan,
> 
> I am sorry for a critical bug in the patch is CONFIRMED.
> 
> Although the probability is rather low, the patch will cause the vm
> file system crashed.
> 
> Suppose that there are 3 nodes in a cluster: A, B and C. A is fast
> and B is slow. Now C is left accidently. Let us focus on node B, if
> GET_OBJ_LIST request of A is received prior to the zookeeper
> EVENT_LEAVE message, B will return object list on the old epoch to
> A. However, what node A want to get is the object list on the new
> epoch. Therefore, some objects cannot be recovered. That is the
> problem.
> 
> To simplify the scenario, I think it is better to use the old
> algorithm, getting all object list first, and then, checking which
> objects are belong to the node itself.
> 
> Could you please help to revert the patch?

Done.

Thanks
Yuan