[Sheepdog] support object recovery

Fri Feb 12 22:49:33 CET 2010

MORITA Kazutaka wrote:
> On Fri, Jan 22, 2010 at 11:01 AM, Piavlo <piavka at cs.bgu.ac.il> wrote:
>   
>>> Yes, vdi directory is like metadata, which we calld `super object' before.
>>> Currently, the redundancy of vdi directory is same as objects redundancy.
>>> Perhaps, we should change it because vdi directory is more important than
>>> data objects.
>>>
>>>       
>> Since this vdi metadata overhead is very small , it is probably
>> reasonable to store the vdi metadata for all vm images in the sheepdog
>> cluster
>> on each storage node. This would not require any metadata recovery if
>> some node goes offline.
>>     
>
> Probably, the cost of updating vdi directory on every nodes is not so cheap
> if the number of nodes is large, because each node must return the ack
>   
Return the ack to where exactly? I mean there is no head node in the
cluster - so who will wait for these acks?
The node on which vm image creation was invoked? As we discussed earlier
it should be possible in the future implementations to separate the
sheepdog client from the server.

> after updating local vdi directory.
>   
 Also, with current implementation,  the servers with vdi metadata
(meta-servers) may become the bottleneck.
For exmaple say I have 100 nodes cluster with --copies=3, with current
implementation
just 3 nodes will be the meta-servers (each one with exactly same metadata).
AFAIU this means that all sheepdog queries will have to go through these
meta-servers, which besides
the metadata are also used for vm block storage serving.

Maybe we can get with following:
Once a specific sheepdog server node receives a request to store a vm
block - and this is the first block for that vm - it will also request
and create vdi metadata for that vm (this should be neglectable overhead).
This way each node will have vm metadata of each vm it stores block for.
So vdi metadata recovery will be piggy backed as part of vm block
recovery, and would not require any special
implementation. Of course before sheepdog cluster starts distributing
block for a new vm in the cluster, it needs first to create the vdi vm
metadata on just the --copies of nodes (which the current implementation
already does anyway).
So this looks ,to me, like a one little change in order to avoid the
need for metadata recovery. What do you think?

Thanks
Alex
> However, no need for metadata recovery is very good.
> We should consider more about it.
>