MORITA Kazutaka wrote: > On Fri, Jan 22, 2010 at 11:01 AM, Piavlo <piavka at cs.bgu.ac.il> wrote: > >>> Yes, vdi directory is like metadata, which we calld `super object' before. >>> Currently, the redundancy of vdi directory is same as objects redundancy. >>> Perhaps, we should change it because vdi directory is more important than >>> data objects. >>> >>> >> Since this vdi metadata overhead is very small , it is probably >> reasonable to store the vdi metadata for all vm images in the sheepdog >> cluster >> on each storage node. This would not require any metadata recovery if >> some node goes offline. >> > > Probably, the cost of updating vdi directory on every nodes is not so cheap > if the number of nodes is large, because each node must return the ack > Return the ack to where exactly? I mean there is no head node in the cluster - so who will wait for these acks? The node on which vm image creation was invoked? As we discussed earlier it should be possible in the future implementations to separate the sheepdog client from the server. > after updating local vdi directory. > Also, with current implementation, the servers with vdi metadata (meta-servers) may become the bottleneck. For exmaple say I have 100 nodes cluster with --copies=3, with current implementation just 3 nodes will be the meta-servers (each one with exactly same metadata). AFAIU this means that all sheepdog queries will have to go through these meta-servers, which besides the metadata are also used for vm block storage serving. Maybe we can get with following: Once a specific sheepdog server node receives a request to store a vm block - and this is the first block for that vm - it will also request and create vdi metadata for that vm (this should be neglectable overhead). This way each node will have vm metadata of each vm it stores block for. So vdi metadata recovery will be piggy backed as part of vm block recovery, and would not require any special implementation. Of course before sheepdog cluster starts distributing block for a new vm in the cluster, it needs first to create the vdi vm metadata on just the --copies of nodes (which the current implementation already does anyway). So this looks ,to me, like a one little change in order to avoid the need for metadata recovery. What do you think? Thanks Alex > However, no need for metadata recovery is very good. > We should consider more about it. > |