[sheepdog] RFC: update protocol version or redesign struct sd_node

Hitoshi Mitake mitake.hitoshi at gmail.com
Tue Jan 14 07:57:16 CET 2014


At Tue, 14 Jan 2014 13:43:01 +0800,
Liu Yuan wrote:
> 
> On Tue, Jan 14, 2014 at 02:34:56PM +0900, Hitoshi Mitake wrote:
> > At Tue, 14 Jan 2014 14:09:42 +0900,
> > Hitoshi Mitake wrote:
> > > 
> > > At Tue, 14 Jan 2014 12:06:24 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Tue, Jan 14, 2014 at 01:00:15PM +0900, Hitoshi Mitake wrote:
> > > > > At Tue, 14 Jan 2014 11:34:03 +0800,
> > > > > Liu Yuan wrote:
> > > > > > 
> > > > > > On Tue, Jan 14, 2014 at 12:00:33PM +0900, Hitoshi Mitake wrote:
> > > > > > > 
> > > > > > > The commit 9ea90be39e9d (sheep: use rbtree to manage struct sd_node)
> > > > > > > changes struct sd_node. So the master branch isn't compatible with
> > > > > > > v0.7.x because it affects the format of GET_NODE_LIST's returned
> > > > > > > buffer. Every dog command which has a flag CMD_NEED_NODELIST cannot
> > > > > > > work (users see nodes have IPv6 addresses).
> > > > > > 
> > > > > > Why you need v0.7.x dog to communite with latest sheep? There are many problems
> > > > > > other than imcompatibility of struct sd_node
> > > > > 
> > > > > I don't need. Communication should be forbidden. Or if it is allowed,
> > > > > the different format problem shuold be solved.
> > > > > 
> > > > > > 
> > > > > > > The affection is very serious so I propose either of the below two
> > > > > > > changes:
> > > > > > 
> > > > > > serious for what usage which can't be workaround and the only solution is to
> > > > > > adopt following 2?
> > > > > 
> > > > > I'm not saying the 2nd one is the only solution. I don't have strong
> > > > > opinions about this issue. Simply forbidding communication between
> > > > > 0.7.x dog and the latest sheep is enough.
> > > > 
> > > > Oops, I misunderstood you and apologies for it.
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 1. change protocol version and forbid communication between 0.7.x dog
> > > > > > >    and the latest sheep
> > > > > > > 
> > > > > > > 2. change struct sd_node and exclude 	struct rb_node  rb; from the
> > > > > > >    members
> > > > > > > 
> > > > > > > I think 2 would be a better option. sd_node is a data structure which
> > > > > > 
> > > > > > > can be exchanged via network and such a data structure shouldn't have
> > > > > > > pointers as its members. Current sd_node is really ugly.
> > > > > > 
> > > > > > Talk is cheap, feel free to post a patch that fullfill current features:
> > > > > > 1. allow 6k+ nodes (100k+ vnodes) manage effeciently (at least don't degrade)
> > > > > > 2. don't break current usage (we already have a production cluster based on
> > > > > >    current latest master, so your patch should work with current data without
> > > > > >    re-format or any tricky upgrade. We want, at least for v0.8.x
> > > > > >    series, we can get a smooth update for furture.
> > > > > 
> > > > > OK, I'll not follow the 2nd option.
> > > > > 
> > > > > > 
> > > > > > There are many ugliness in sheep and also many existing problems. My advice is
> > > > > > solve problems at first and don't invest time on something that changing core
> > > > > > code just for something you think of beatiful. Chanign coring is always at risk
> > > > > > to introduce new bugs and need time to stablize. I introduce rb_tree to manage
> > > > > > vnodes because we have to use it to manage more than 10 thousands of vnodes, which
> > > > > > old array approach can't handle well. So what is your motivation besides 'ugliness'?
> > > > > 
> > > > > Generally speaking, data structures which can be sent/recv network
> > > > > shouldn't contain pointers. That's all.
> > > > 
> > > > Yes, I admited it was ugly but at the time I tried to remove 1024 max nodes, this
> > > > is the most easy way to solve the problem and I couldn't manage to not include
> > > > in the sd_node or sd_vnode for a tree management.
> > > > 
> > > > > Do you agree with incrementing protocol version?
> > > > 
> > > > Yes, I do agree.
> > > 
> > > OK, I'll send a patch for it later.
> > 
> > I noticed that current policy of sheep's protocol version matching is
> > like below:
> > 
> > if (hdr->proto_ver > SD_PROTO_VER) {
> > 	rsp->result = SD_RES_VER_MISMATCH;
> > 	goto done;
> > }
> 
> I think you should consider increment SD_SHEEP_PROTO_VER in include/internal_proto.h
> which controle protocol between sheep and dog.
> 
> > 
> > # from queue_request()
> > 
> > Incrementing the protocol version number cannot break compatibility
> > between sheep and clients.
> > 
> > I think it should be changed as more strict one like this:
> > 
> > if (hdr->proto_ver != SD_PROTO_VER) {
> > 	rsp->result = SD_RES_VER_MISMATCH;
> > 	goto done;
> > }
> > 
> > This policy also requires changes to QEMU and tgt. But it should be
> > done because of the change of sd_node layout.
> 
> sd_node is just internal state of sheep daemon, nothing to do with tgt or QEMU.
> Unless include/sheepdog_proto.h has something to break compatiblity, we shouldn't
> try to break it.

Ah sorry, I misunderstand that SD_SHEEP_PROTO_VER is also used for
sheep-dog communication, not only for sheep-sheep communication.

I'll increment SD_SHEEP_PROTO_VER.

Thanks,
Hitoshi


More information about the sheepdog mailing list