> > Where and how are those 4BM blocks stored locally? > > 4MB block (we call "object") is simply stored as a file > named its object id. > We are looking for a local key-value store to store objects more > efficiently. > We have tried Berkeley DB as a local storage, but its performance is > not good for 4 MB objects. > Berkeley DB looks like tuned to more smaller blocks. And do you always write 4MB - or is it possible to write smaller blocks? > > And how does the partition recovery algorithm work? > > When failure has occured, new partition information is sent from > JGroups master group, and the recovery thread moves objects based > on new partition information in the background. > Vdi objects store the old partition version numbers with each But 'partition version numbers' can be the same, although the data is different (when the cluster was partitioned)? > data object id, so VM can get the old partition information > which is used to store the data object at the time. > By using the old partition information, VM can access data object > even when data is before recovering. But how do you compare data? I mean you need to make sure data all nodes have exactly the same data. Are you using some kind of hash/digest (tiger hash, merkle tree)? - Dietmar |