[sheepdog] [PATCH] recovery: notify completion only when all objects are fresh

Fri May 31 20:19:05 CEST 2013

On 05/31/2013 10:10 PM, MORITA Kazutaka wrote:
> At Fri, 31 May 2013 21:50:56 +0800,
> Liu Yuan wrote:
>>
>> On 05/31/2013 08:55 PM, MORITA Kazutaka wrote:
>>> To reduce the risk of data loss, we shouldn't remove stale objects if
>>> there are some sheeps who failed to recover objects.
>>
>> So once it it was set true, we'll never get a chance to purge stale
>> objects? This looks kind of unacceptable to me.
>>
>> I think we should only stop notification for this very recovery only.
>> And next recovery should work as normal.
> 
> If once we failed in object recovery, we cannot assure that all the
> objects in the working directory are not stale even if we succeed in
> the next recovery.  For example,
> 
> - epoch 1, node [A, B]
>   Node A has object o.
> 
> - epoch 2, node [A, B, C]
>   Object o is moved to Node C, and o is updated to o'.
> 
> - epoch 3, node [A, B, C, D]
>   Node D tries to recover the object o' from C but fails.  Then node C
>   reads the object o (stale) from the node A.  (node D is safe mode)
> 

Why failed recover o' from C? So in a multiple node events, stop_notify
won't be easily to be false positive?

> - epoch 4, node [A, B, C, D, E]
>   Node E reads the object o (stale) from node D.  After all the nodes
>   finish object recovery, node C removes the latest object o' in the
>   stale directory if we allow node D to notify recovery completion.
> 
> I think there is no way to recover automaticaly from the safe_mode
> state.  The risk of data loss is not acceptable to me.  As long as the
> risk exists, we must not remove the stale direcotry.  In this case,
> the user has to look into why it happens and restart the sheep daemon
> after the problem is fixed.
> 

Even if you don't remove stale objects, users are not easy to recover
the *right* objects. How can users tell which is the right noe? This
will just throw users more problems than it solves.

Most users will simply use 'vdi check' to restore consistency.
Assumption of manual recovery is not feasible to ordinary users. What we
really need is, IMHO, to teach 'vdi check' to recover the latest objects
as hard as possible. By the way, with multiple copies, it seems very
unlikely to meet a inconsistent case for one object just because of
recovery. I'd like to see a test case to demonstrate what is the real
case (not a single copy) and consider the solutions for it, not targeted
for imaginary case.

Thanks,
Yuan