[sheepdog] RFC: update protocol version or redesign struct sd_node

Liu Yuan namei.unix at gmail.com
Tue Jan 14 06:43:01 CET 2014


On Tue, Jan 14, 2014 at 02:34:56PM +0900, Hitoshi Mitake wrote:
> At Tue, 14 Jan 2014 14:09:42 +0900,
> Hitoshi Mitake wrote:
> > 
> > At Tue, 14 Jan 2014 12:06:24 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Tue, Jan 14, 2014 at 01:00:15PM +0900, Hitoshi Mitake wrote:
> > > > At Tue, 14 Jan 2014 11:34:03 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > On Tue, Jan 14, 2014 at 12:00:33PM +0900, Hitoshi Mitake wrote:
> > > > > > 
> > > > > > The commit 9ea90be39e9d (sheep: use rbtree to manage struct sd_node)
> > > > > > changes struct sd_node. So the master branch isn't compatible with
> > > > > > v0.7.x because it affects the format of GET_NODE_LIST's returned
> > > > > > buffer. Every dog command which has a flag CMD_NEED_NODELIST cannot
> > > > > > work (users see nodes have IPv6 addresses).
> > > > > 
> > > > > Why you need v0.7.x dog to communite with latest sheep? There are many problems
> > > > > other than imcompatibility of struct sd_node
> > > > 
> > > > I don't need. Communication should be forbidden. Or if it is allowed,
> > > > the different format problem shuold be solved.
> > > > 
> > > > > 
> > > > > > The affection is very serious so I propose either of the below two
> > > > > > changes:
> > > > > 
> > > > > serious for what usage which can't be workaround and the only solution is to
> > > > > adopt following 2?
> > > > 
> > > > I'm not saying the 2nd one is the only solution. I don't have strong
> > > > opinions about this issue. Simply forbidding communication between
> > > > 0.7.x dog and the latest sheep is enough.
> > > 
> > > Oops, I misunderstood you and apologies for it.
> > > 
> > > > 
> > > > > 
> > > > > > 1. change protocol version and forbid communication between 0.7.x dog
> > > > > >    and the latest sheep
> > > > > > 
> > > > > > 2. change struct sd_node and exclude 	struct rb_node  rb; from the
> > > > > >    members
> > > > > > 
> > > > > > I think 2 would be a better option. sd_node is a data structure which
> > > > > 
> > > > > > can be exchanged via network and such a data structure shouldn't have
> > > > > > pointers as its members. Current sd_node is really ugly.
> > > > > 
> > > > > Talk is cheap, feel free to post a patch that fullfill current features:
> > > > > 1. allow 6k+ nodes (100k+ vnodes) manage effeciently (at least don't degrade)
> > > > > 2. don't break current usage (we already have a production cluster based on
> > > > >    current latest master, so your patch should work with current data without
> > > > >    re-format or any tricky upgrade. We want, at least for v0.8.x
> > > > >    series, we can get a smooth update for furture.
> > > > 
> > > > OK, I'll not follow the 2nd option.
> > > > 
> > > > > 
> > > > > There are many ugliness in sheep and also many existing problems. My advice is
> > > > > solve problems at first and don't invest time on something that changing core
> > > > > code just for something you think of beatiful. Chanign coring is always at risk
> > > > > to introduce new bugs and need time to stablize. I introduce rb_tree to manage
> > > > > vnodes because we have to use it to manage more than 10 thousands of vnodes, which
> > > > > old array approach can't handle well. So what is your motivation besides 'ugliness'?
> > > > 
> > > > Generally speaking, data structures which can be sent/recv network
> > > > shouldn't contain pointers. That's all.
> > > 
> > > Yes, I admited it was ugly but at the time I tried to remove 1024 max nodes, this
> > > is the most easy way to solve the problem and I couldn't manage to not include
> > > in the sd_node or sd_vnode for a tree management.
> > > 
> > > > Do you agree with incrementing protocol version?
> > > 
> > > Yes, I do agree.
> > 
> > OK, I'll send a patch for it later.
> 
> I noticed that current policy of sheep's protocol version matching is
> like below:
> 
> if (hdr->proto_ver > SD_PROTO_VER) {
> 	rsp->result = SD_RES_VER_MISMATCH;
> 	goto done;
> }

I think you should consider increment SD_SHEEP_PROTO_VER in include/internal_proto.h
which controle protocol between sheep and dog.

> 
> # from queue_request()
> 
> Incrementing the protocol version number cannot break compatibility
> between sheep and clients.
> 
> I think it should be changed as more strict one like this:
> 
> if (hdr->proto_ver != SD_PROTO_VER) {
> 	rsp->result = SD_RES_VER_MISMATCH;
> 	goto done;
> }
> 
> This policy also requires changes to QEMU and tgt. But it should be
> done because of the change of sd_node layout.

sd_node is just internal state of sheep daemon, nothing to do with tgt or QEMU.
Unless include/sheepdog_proto.h has something to break compatiblity, we shouldn't
try to break it.

Thanks
Yuan


More information about the sheepdog mailing list