[sheepdog] [PATCH v2 08/11] group: add node to sockfd_cache as early as possible

Liu Yuan namei.unix at gmail.com
Wed Sep 18 07:02:47 CEST 2013


On Wed, Sep 18, 2013 at 01:24:33PM +0900, Hitoshi Mitake wrote:
> At Wed, 18 Sep 2013 11:52:16 +0800,
> Liu Yuan wrote:
> > 
> > On Wed, Sep 18, 2013 at 12:34:36AM +0900, Hitoshi Mitake wrote:
> > > At Sat, 14 Sep 2013 18:34:28 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > get_vdis will try to connect to all the nodes include joining node, so before
> > > > calling get_vdis, we'd better add joining node in the sockfd cache. This is not
> > > > a fatal problem and sockfd cache can actually sort it out with the old code, but
> > > > this patch will safe us unnecessary revalidating node after grab failure.
> > > > 
> > > > Signed-off-by: Liu Yuan <namei.unix at gmail.com>
> > > > ---
> > > >  sheep/group.c |    3 +--
> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > 
> > > > diff --git a/sheep/group.c b/sheep/group.c
> > > > index 660fc5a..0b03150 100644
> > > > --- a/sheep/group.c
> > > > +++ b/sheep/group.c
> > > > @@ -633,6 +633,7 @@ static void update_cluster_info(const struct cluster_info *cinfo,
> > > >  
> > > >  	if (node_is_local(joined))
> > > >  		sockfd_cache_add_group(nroot);
> > > 
> > > We should add else here. If the joined node is local, the newly
> > > created connections will not be useful.
> > > 
> > 
> > Yes for now, but the erasure code patch series I am currently working on will
> > need to add itself to the sockfd cache. So let's keep it as is in this patch.
> 
> If the erasure coding cannot be implemented without connecting to
> local nodes and it cannot be an optional thing, the "else" shouldn't
> be added. But it is hard to imagine such an implementation.
> 
> I think eliminating the else after adding the erasure coding isn't too
> late.

I'll post the patch set probably next week and I've already removed local node
shortcuts for read/write. 

/*
 * Stripe: data strips + parity strips, spread on all replica
 * DS: data strip
 * PS: parity strip
 * R: Replica
 *
 *  +--------------------stripe ----------------------+
 *  v                                                 v
 * +----+----------------------------------------------+
 * | ds | ds | ds | ds | ds | ... | ps | ps | ... | ps |
 * +----+----------------------------------------------+
 * | .. | .. | .. | .. | .. | ... | .. | .. | ... | .. |
 * +----+----+----+----+----+ ... +----+----+-----+----+
 *  R1    R2   R3   R4   R5   ...   Rn  Rn+1  Rn+2  Rn+3
 */

In my patch set, I use replica to hold data and parity strips. Suppose we have a
4:2 scheme, 4 data strips and 2 parity strips on 6 replica and strip size = 1k,
so basically we'll generate 2k parites for each 4k write, we call this 6K as
stripe as a whole. For write, we'll horizontally spread data, not vertically as
replciation. It is very convenient to treat every replica as equal node using
sockfd to connect to, in this way we can simplify the write in a for loop. I
agree that we can still do shortcut write to local node with more complicated
code though.

Anyway, I don't want to add 'else' in this patch because:
- this is another issue than what this patch tries to solve
- even if we need to add else later, we should get it done with another patch

Thanks
Yuan



More information about the sheepdog mailing list