[sheepdog] [PATCH V4 2/2] collie: optimize 'collie vdi check' command

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Sep 3 15:29:21 CEST 2012


At Mon, 3 Sep 2012 21:02:22 +0800,
Yunkai Zhang wrote:
> 
> On Mon, Sep 3, 2012 at 8:40 PM, MORITA Kazutaka
> <morita.kazutaka at lab.ntt.co.jp> wrote:
> > At Mon, 3 Sep 2012 20:11:48 +0800,
> > Yunkai Zhang wrote:
> >>
> >> On Mon, Sep 3, 2012 at 7:46 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> >> > On 09/03/2012 07:12 PM, Yunkai Zhang wrote:
> >> >> On Mon, Sep 3, 2012 at 7:03 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> >> >>> On 09/03/2012 06:54 PM, Yunkai Zhang wrote:
> >> >>>> We can compare this result with comand line tools: sha1sum, which can
> >> >>>> also help us to verify program's correction I think.
> >> >>>
> >> >>> Could you specify a use case for this? I have no idea how end users can
> >> >>> make use of sha1 string. For a crashed cluster, I think want users need
> >> >>> to do is, run check to restore cluster consistency by one command like fsck.
> >> >>
> >> >> Our root problem is just why we need to move sha1_to_hex from
> >> >> farm/sha1_file.c to lib/sha1.c, does this change bother you?
> >> >>
> >> >
> >> > Of course, it is ugly. Sha1 is only used in Farm, moving it out of Farm implementation
> >> > is nonsense to me.
> >> >
> >> >> Show sha1 string will make the output more meaningful for us, why need
> >> >> to a use case? It's a so small/simple thing, do you like argue
> >> >> everything?
> >> >>
> >> >
> >> > 1) it is my duty to review patches and whoever tries to move sha1_to_hex() out of
> >> > sha1_file.c, I will ask why he needs it and verify if it is a reasonable movement.
> >> > 2) it is you who always take patch comments as malicious argument which is meant to
> >> > be against you but in fact only meaningful to patch itself.
> >>
> >> You misunderstand me, I didn't take any comments as malicious argument
> >> for me, what I think is that you spend so much time to discuss those
> >> triviality.
> >>
> >> Don't you think 'Move sha1.c from farm/sha1_file.c to lib/sha1.c' is
> >> worth discussing? Time is very precious for us, does it?
> >
> > If we need the function outside farm, it is okay to move it to
> > lib/sha1.c.  I think what Yuan want to know is that how it will help
> > users to print sha1 values in the collie outputs.  What is the
> > situation users want to compare hash strings?
> 
> What the original purpose of this patch is to let user see the
> different replicas of inconsistent object, instead of repairing it
> directly.
> 
> This patch can help us to test cluster's consistency:
> 1. We need not to write shell script to read raw objects and compare
> it from each node, especially in a big cluster, using shell to compare
> is not efficient.

So why do you need to compare each hash strings?  What users want to
know is not hash strings, but only whether there is object
inconsistency or not, no?

> 
> 2. This patch can help me to test code when develop program, it can
> use to check 'delay recovery' which may lead to inconsistency when
> implement incorrectly. Printing sha1 in output, and comparing this
> value with sha1sum's output, It can help us to verity program's
> correction (I have mentioned it in previous email).

The design of delay recovery we agreed before is:
 - Basically, a recovery process doesn't recover objects even if epoch
   is incremented
 - But when clients (qemu, collie, other sheeps) access objects, the
   recovery process recover the objects.  This is necessary for
   clients to read the latest objects.

So 'collie vdi check' must read valid objects.  I think we cannot use
the command for debugging delay recovery.  IMHO, running md5sum
directly against stored objects on filesystem looks much simpler and
useful for debugging.

Thanks,

Kazutaka



More information about the sheepdog mailing list