[sheepdog] on the wire protocols: structure, versioning, etc

Fri Jun 15 17:54:03 CEST 2012

I've recently been going over the on the wire structures a bit, and here's a
few notes and questions:

* endianness annotations

	I think we should add Linux-kernel style endianness annotations
	for the on the wire (and on disk) structures.  Not only does
	this allow for inter-operability of different endianness hosts
	which some might only consider a minor feature, but it also
	makes it very clear in the code which are the on wire
	structures, so that they are handled with care for eventual
	changes, and special care is taken of alignment and similar
	issues.

	Which brings me to the next issue:

* clearly defining the on wire protocols

	Right now there are a lot of structures on the wire in places
	where you don't expect it.  Besides the obvious bits in
	include/sheepdog_proto.h there are some additional opcodes and
	their structures in include/sheep.h which also hosts some
	shared code between collie and the sheep daemon, in
	sheep/sheep_priv.h which otherwise just includes structures
	private to the main module of the sheep daemon and not even
	shared with the cluster driver, some payloads directly defined
	in sheep/group.c, and the cluster driver specific event types
	directly inside the cluster drivers.

	This turns into the next thing:

* splitting the different protocols

	While all communication between the components of sheepdog share
	some common constants there are at least two, if not three
	different sub protocols:

	    - the main user facing protocol, spoken between the qemu
	      frontend (or any other plain I/O fronted) and the gateway
	      sheep
	    - the protocol between sheep daemons, including the the
	      cluster driver level events, join/leave/notify messages,
	      SD_FLAG_CMD_IO_LOCAL type read/write requests, get object
	      list commands for recovery
	    - any magic admin communication between collie and sheep,
	      although by some argument these could be added to either
	      of the above ones on a case by case basis.

	Identifying them as different protocols will also allow to
	version them differently, including basically unlimited
	backwards compatibility for the frontend, while allowing to
	increment the backend protocol revisions and thus either letting
	sheep with the wrong version fail the join gracefully, or with
	some effort allowing sheep to inter operate with different
	versions (with a lot of testing overhead)

If everyone agrees with these basic concepts I'd like to move forward
with:

 (1) split each sub-protocol into a well-documented header
 (2) add sparse annotations
 (3) replace the SD_FLAG_CMD_IO_LOCAL flag with different operation
     types for the sheep peer I/O.  Not only does it make clear they
     are part of a different protocol, but it will also allow to
     use normal ops.c-like dispatch for the gateway
 (4) add separate versioning for the sheep peer protocol, and probably
     the per-cluster driver protocols.