[sheepdog] [PATCH v2 08/11] group: add node to sockfd_cache as early as possible

Hitoshi Mitake mitake.hitoshi at gmail.com
Wed Sep 18 07:31:37 CEST 2013


At Wed, 18 Sep 2013 13:02:47 +0800,
Liu Yuan wrote:
> 
> On Wed, Sep 18, 2013 at 01:24:33PM +0900, Hitoshi Mitake wrote:
> > At Wed, 18 Sep 2013 11:52:16 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Wed, Sep 18, 2013 at 12:34:36AM +0900, Hitoshi Mitake wrote:
> > > > At Sat, 14 Sep 2013 18:34:28 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > get_vdis will try to connect to all the nodes include joining node, so before
> > > > > calling get_vdis, we'd better add joining node in the sockfd cache. This is not
> > > > > a fatal problem and sockfd cache can actually sort it out with the old code, but
> > > > > this patch will safe us unnecessary revalidating node after grab failure.
> > > > > 
> > > > > Signed-off-by: Liu Yuan <namei.unix at gmail.com>
> > > > > ---
> > > > >  sheep/group.c |    3 +--
> > > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/sheep/group.c b/sheep/group.c
> > > > > index 660fc5a..0b03150 100644
> > > > > --- a/sheep/group.c
> > > > > +++ b/sheep/group.c
> > > > > @@ -633,6 +633,7 @@ static void update_cluster_info(const struct cluster_info *cinfo,
> > > > >  
> > > > >  	if (node_is_local(joined))
> > > > >  		sockfd_cache_add_group(nroot);
> > > > 
> > > > We should add else here. If the joined node is local, the newly
> > > > created connections will not be useful.
> > > > 
> > > 
> > > Yes for now, but the erasure code patch series I am currently working on will
> > > need to add itself to the sockfd cache. So let's keep it as is in this patch.
> > 
> > If the erasure coding cannot be implemented without connecting to
> > local nodes and it cannot be an optional thing, the "else" shouldn't
> > be added. But it is hard to imagine such an implementation.
> > 
> > I think eliminating the else after adding the erasure coding isn't too
> > late.
> 
> I'll post the patch set probably next week and I've already removed local node
> shortcuts for read/write. 

Great! I'm looking forward to seeing your patchset :)

> 
> /*
>  * Stripe: data strips + parity strips, spread on all replica
>  * DS: data strip
>  * PS: parity strip
>  * R: Replica
>  *
>  *  +--------------------stripe ----------------------+
>  *  v                                                 v
>  * +----+----------------------------------------------+
>  * | ds | ds | ds | ds | ds | ... | ps | ps | ... | ps |
>  * +----+----------------------------------------------+
>  * | .. | .. | .. | .. | .. | ... | .. | .. | ... | .. |
>  * +----+----+----+----+----+ ... +----+----+-----+----+
>  *  R1    R2   R3   R4   R5   ...   Rn  Rn+1  Rn+2  Rn+3
>  */
> 
> In my patch set, I use replica to hold data and parity strips. Suppose we have a
> 4:2 scheme, 4 data strips and 2 parity strips on 6 replica and strip size = 1k,
> so basically we'll generate 2k parites for each 4k write, we call this 6K as
> stripe as a whole. For write, we'll horizontally spread data, not vertically as
> replciation. It is very convenient to treat every replica as equal node using
> sockfd to connect to, in this way we can simplify the write in a for loop. I
> agree that we can still do shortcut write to local node with more complicated
> code though.
> 
> Anyway, I don't want to add 'else' in this patch because:
> - this is another issue than what this patch tries to solve
> - even if we need to add else later, we should get it done with another patch

OK, removing the connections to local nodes would be a trivial
work. Let's simplify this patchset and your erasure coding one.

I agree with not adding the else in update_cluster_info().

Thanks,
Hitoshi



More information about the sheepdog mailing list