[sheepdog] [PATCH] recovery: notify completion only when all objects are fresh

Sun Jun 2 13:25:24 CEST 2013

On 06/02/2013 03:09 AM, MORITA Kazutaka wrote:
> For the manual recovery, we have to read sheep.log carefully and
> determine which object is the correct one in the stale objects.
> Actually, I did this several months ago on some user environment.  At
> that time, I could recover objects because their sheepdog crashed and
> the stale directories were not cleaned up.  The crash reason is fixed
> in the curret sheepdog, so if they would use the latest version of
> sheepdog, I couldn't fix their environment.
> 
> What this patch tries to do is just only giving the chance for users
> who have deep knowledge of sheepdog object recovery.  If you think we
> shouldn't include such feature, it's okay for me to keep this change
> for my own tree.  Perhaps, what I should add is rather a documentation
> about the risk of data loss.

So it is better to keep it as a out of tree patch, for users that are
able to do advanced manual recovery and can tolerate a cluster downtime
just because of some unrecoverable broken or stale objects.

In my opinion, shutdown the cluster is the worst solution for
production. Most users can tolerate the partial vdi broken and remove
the broken vdi as the worst case, but the good vdi survive and the
service isn't stopped at all. With the backup tool in mind, such as vdi
backup, or cluster-wide backup, the unrecoverable objects can be
recovered via more user friendly backup tools.

The manual backup, looks to me a much more reliable solution to solve
unrecoverable broken & stale objects problem, because service uptime is
as important as data reliability.

Thanks,
Yuan