Hi, Thanks for clarifying the vdi directory/object difference. AFAIU the vdi directory does not contain information on which node to find the vdi object, but just the vdi object id, and due to consistent hashing based on the vdi object id and node ids we can derive on which nodes the vdi object is actually stored? Does the vdi object contain only data object id (since again data object id and node ids - is enough to derive on which nodes the data object is stored) or vdi object also explicitly contains, for each data object id, a list of nodes where data object is actually stored? >>> >>> I think the servers with vdi metadata directory does not become the >>> bottleneck. It is because the directory contains only the list of vdi >>> names, creation times, redundancies, etc, In my case i see only single zero size file /sheepdog/0/vdi/vdiname/0000000000040000 from this one can only derive the vdi name ,vdi object id and vdi creation time, but not redundancies and etceteras ? What info does the /{store_dir}/epoch/0000000X contain and when it is accessed? Also how is the /{store_dir}/obj/0000000(X+1) derived from the previous epoch then a new node joins the cluster (and not some node fails)? Is this implemented by simple copy on write of the directory and thus depends on btrfs? Also to avoid the /{store_dir}/obj/{epoch} directory block indirection (which hurts performance) with large number of vdis, isn't it better to store data object in the following structure: /{store_dir}/obj/{epoch}/{vdi_object_id}/{data_object_id} instead of: /{store_dir}/obj/{epoch}/{data_object_id} ? >>> and VMs don't access the vdi >>> directory at all after they open the vdi. >>> >>> The data object IDs of each vdi are stored to the vdi object. The vdi >>> object is accessed each time the VM allocate a new data object, but >>> there is no bottleneck server. It is because the vdi objects are >>> distributed over the all nodes like data objects. >>> >> Thats not that i observed, all vdi metadata objects are currently ending >> up on the very same set of servers. >> So if I have 100 nodes and used --copies=3 CURRENTLY there will be just >> 3 vdi metadata servers for all VMs. >> >> > > The node of the vdi directory is fixed, but the vdi object may be > created in the other nodes like data objects. > Since the vdi directory nodes are fixed, then how is this info maintained on master server and more importantly how does this info survive then the master server dies or then cluster is rebooted? AFAIU since the vdi directory nodes are fixed and there is a single master server (even that it is automatically elected) coordinating access to vdi metadata servers - sheepdog is not fully distributed storage. Maybe if the vdi metadata directory was stored in a distributed hash across the sheepdog servers,and any server could be contacted to read/write the vdi directory information, there would be no need for coordinating master server. Thanks Alex |