[sheepdog] [PATCH v2 05/10] work: try to create worker threads in worker_thread_request_done

Tue May 14 12:01:55 CEST 2013

Aye. I was thinking something more separate though.

The implementation I had in mind was something more like this:
- Only backs up readonly objects, if you want a VDI to be protected by
a backup create a snapshot of it before starting cluster backup
- Cluster backup is initiated and managed by a single node (master),
probably the node the command is issued on.
- Master first creates a copy of the VDI tree.
- Master instructs the sheep nodes to begin hashing any currently
unhashed readonly objects
- The master gets a manifest of objects from the backup storage as a
list of hashes
- Master collects the manifests of hashes from the sheep nodes,
sending sheeps a list of objects not found in the manifest for them to
backup.
- Master writes a new manifest to the backup storage, this is a copy
of the VDI tree as of the read-only state, annotated with the hashes
of the objects.
- Master waits on completion notifications from sheeps.
- Backup is now complete

The idea is that each of the sheeps can talk to the backup storage and
perform backup in parallel. This is only really compatible with
NFS/S3/Swift.
For testing we can assume storage is NFS but just use a local directory.

Because of how the VDI tree is structured you don't really need
hashing..we get good enough dedup from CoW, hashing just makes the
system more elegant and provides the ability to add integrity checks
at a later date.
In an ideal world hashing would only happen when the system has some
idle free CPU and IO, and only forced to be completed when a backup is
required.
Think along the lines of "collie scrub $VDI" or "collie cluster scrub"
similar to how ZFS is able to protect data. Even though the data isn't
hashed at write time it should provide the same guarantees because the
data of all the replicas can be hashed, which means the corruption of
one object can happen even before hashing and it can be caught and
fixed with a scrub assuming copies=3.

Joseph.

On Mon, May 13, 2013 at 11:10 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 05/14/2013 01:26 PM, Joseph Glanville wrote:
>> You can also use it for dedup of stale/snapshot objects within the
>> cluster, when you discover duplicates you can repoint the links to
>> only keep the number of copies specified by the VDI with the largest
>> copy count.
>
> Back to the very old days, we put stale objects in the Farm, but later
> we found it led a very poor performance for recovery, so we relocated
> stale objects into working directory without SHA trick.
>
> For VDI snapshot objects, we might store them into Farm, but is not yet
> implemented. But there are some drawbacks to this approach besides added
> code complexity: Farm isn't MD friendly, we have to choose one volume
> for it.
>
> Thanks,
> Yuan