[Sheepdog] support object recovery

Mon Feb 15 08:21:18 CET 2010

MORITA Kazutaka wrote:
>> Also, with current implementation,  the servers with vdi metadata
>> (meta-servers) may become the bottleneck.
>> For exmaple say I have 100 nodes cluster with --copies=3, with current
>> implementation
>> just 3 nodes will be the meta-servers (each one with exactly same metadata).
>> AFAIU this means that all sheepdog queries will have to go through these
>> meta-servers, which besides
>> the metadata are also used for vm block storage serving.
>>     
>
> I think the servers with vdi metadata directory does not become the
> bottleneck. It is because the directory contains only the list of vdi
> names, creation times, redundancies, etc, and VMs don't access the vdi
> directory at all after they open the vdi.
>
> The data object IDs of each vdi are stored to the vdi object. The vdi
> object is accessed each time the VM allocate a new data object, but
> there is no bottleneck server. It is because the vdi objects are
> distributed over the all nodes like data objects.
Thats not that i observed, all vdi metadata objects are currently ending
up on the very same set of servers.
So if I have 100 nodes and used --copies=3 CURRENTLY there will be just
3 vdi metadata servers for all VMs.

Just to be sure, is the following a vdi metadata object ?
fire-srv3 0 # ls -la vdi/pacemaker1.dmz.cs.bgu.ac.il/0000000000040000
-rw-r----- 1 root root 0 2010-02-12 20:33
vdi/pacemaker1.dmz.cs.bgu.ac.il/0000000000040000
fire-srv3 0 #
So vdi object has no data , but it's name just indicates with which
prefix (000000000004 in this case) to store the vdi data blocks.

>  The vdi object
> recovery is also done in the same process as data objects recovery.
>
>   
 Now I'm confused, since in the previous mail you've told that vdi
object recovery is not implemented yet.
>> Maybe we can get with following:
>> Once a specific sheepdog server node receives a request to store a vm
>> block - and this is the first block for that vm - it will also request
>> and create vdi metadata for that vm (this should be neglectable overhead).
>> This way each node will have vm metadata of each vm it stores block for.
>> So vdi metadata recovery will be piggy backed as part of vm block
>> recovery, and would not require any special
>> implementation. Of course before sheepdog cluster starts distributing
>> block for a new vm in the cluster, it needs first to create the vdi vm
>> metadata on just the --copies of nodes (which the current implementation
>> already does anyway).
>> So this looks ,to me, like a one little change in order to avoid the
>> need for metadata recovery. What do you think?
>>
>>     
>
> In the initial release of sheepdog, we stored the vdi directory
> information to the one object - we called it a super object - but
> to get rid of btrfs dependency, we changed it to the current style.
>
>   
 Sorry ,if I'm not clear enough, i'm not suggesting changing the FORMAT
of the vdi metadata objects.
What I'm saying is that: if a node stores some specific VM data block
(even just one data block), then this node would also store a vdi
metadata object for that VM.
So then the node receives a request to store the very first data block
for specific VM, it would also pull the vdi metadata object for that vm.
In this situation there is no need for separate vdi matadata recovery 
implementation, since it's guaranteed that once node has vm data it also
has the vm vdi metadata object locally.


 Thanks
 Alex

> The related discussion is here:
>   http://lists.wpkg.org/pipermail/sheepdog/2009-December/000076.html
>
> Thanks,
>
> Kazutaka Morita
>