From: Liu Yuan <tailai.ly at taobao.com> Signed-off-by: Liu Yuan <tailai.ly at taobao.com> --- doc/farm-internal.txt | 121 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 121 insertions(+), 0 deletions(-) create mode 100644 doc/farm-internal.txt diff --git a/doc/farm-internal.txt b/doc/farm-internal.txt new file mode 100644 index 0000000..29edc8b --- /dev/null +++ b/doc/farm-internal.txt @@ -0,0 +1,121 @@ + ================== + Farm Store + ================== + + Liu Yuan <namei.unix at gmail.com> Taobao Inc. + +1. OVERVIEW + +Farm is an object store for Sheepdog on node basis. It consists of backend +store, which caches the snapshot objects, and working drrectory, storing +objects that Sheepdog currently operates. That being said, the I/O performance +for VM Guests would be practically the same as Simple Store. + +Snapshots are triggered either by system recovery code or users, and Farm is +supposed to restore all the object states into the ones at the time of the user +snapshot being taken. Snapshot object in the context means both meta object and +data object. + +2. DESIGN + +Simply put, Farm somewhat resembles git a lot (both code and idea level). +there are three object type, named 'data, trunk, snapshot[*]' that is +similar to git's 'blob, tree, commit'. + +[*] shorten to 'snap' below. + +'data' object is just Sheepdog's I/O object, only named by its sha1-ed +content. So the data objects with the same content will be mapped to only +single sha1 file, thus achieve node-wide data sharing. + +'trunk' object ties data objects together into a flat directory structure at +the time of the snapshot being taken. The trunk object provides a means to +find old data objects in the store. + +'snap' object describes the snapshot, either initiated by users or triggered +by recovery code. The snap object refers to one of the trunk objects. The two +snap log files provides a means to name the desired snap object. + +All the objects are depicted in the context of snapshotting or retrieving old +data from the snapshotted objects, that is, those objects are 'cached' into +Farm store by performing snapshot operations. + +2. OBJECT LAYOUT + +All the objects(snap, trunk, data) in the Farm is based on the operations of +the sha1_file. sha1_file provides us compressed and consistency-aware +characteristics independent of content or the type of the object. + +The object successfully inflates to a stream of bytes that forms a sequence of + + <sha1_file_hdr> + <binary object data> + | | + header payload + +The payload of the data object is the compressed content of Sheepdog's I/O object. + +For trunk object, the compressed content is + + <array of the struct trunk_entry> + + struct trunk_entry { + uint64_t oid; + unsigned char sha1[SHA1_LEN]; + }; + +For snap object, the compressed content is + + <trunk_sha1> + <array of the struct sd_node> + +As for snap operations, besides snap object, Farm has two log files with the below +structure + + struct snap_log { + int epoch; + uint64_t time; + unsigned char sha1[SHA1_LEN]; + }; + +This provides an internal naming mechanism and help us find snap objects by epoch. + +3. STALE OBJECT + +For storing one object into backend store when the snapshot is taken, either + + a) no content change, then point to the same old sha1_file (no stale object) + or + b) content updated, then will point to a new object with a new sha1. + +We need to remove stale object in case b), only in the assumption that it is the +object generated by recovery code. [*] + +When we try store new snapshot object into the backend store, it is safe and +good timing for us to remove the old object with the same object ID. + +For user snapshot objects, we don't need to remove them until the snapshot is deleted. + +[*] Here I assume we don't need to restore to 'sys epoch' state. + +4. FLOW FIGURE + + + sys_snap, user_snap snapshot requests + | | + |put/get snap_sha1 | trigger + v | + +----------+ +------+ +--------+ v +----------+ + | |<------>| snap |<++++++>| | <========> | | + | | +------+ | | | Farm | + | | | trunk | | Working | I/O +-------+ + | |<---------------------->| | | Directory| <~~~~~~>|sheep | + | Farm | +--------+ | | +-------+ + | Backend | | | + | Store | | | + | |<-------------------------------------------->| | + | | | | + +----------+ +----------+ + +<-----> put/get objects to/from Farm Store +<+++++> put/get trunk_sha1 to/from snap object +<=====> put/get oid/oid_sha1 pairs to/from trunk object + -- 1.7.8.2 |