At Thu, 22 Sep 2011 14:34:17 +0800, Liu Yuan wrote: > > On 09/22/2011 02:01 PM, MORITA Kazutaka wrote: > > At Wed, 21 Sep 2011 14:59:26 +0800, > > > > After that, we get the consistent epoch like the follows. > > > > Creation time Epoch Nodes > > 2011-09-22 14:18:33 6 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002] > > 2011-09-22 14:18:33 5 [10.68.14.1:7000, 10.68.14.1:7001] > > 2011-09-22 14:18:33 4 [10.68.14.1:7000] > > 2011-09-22 14:18:33 3 [10.68.14.1:7002] > > 2011-09-22 14:18:33 2 [10.68.14.1:7001, 10.68.14.1:7002] > > 2011-09-22 14:18:33 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002] > > > > In this case, Sheepdog discards all the objects which were stored > > before epoch 4. It is because there is no overlap between epoch 3 and > > 4, and Sheepdog cannot handle this situation now. > > > > I think this can be fixed with a small change. I'll dig into this > > issue. > > > > > > Thanks, > > > > Kazutaka > Hi Kazutaka, > I also noticed the objects discarded by sheepdog after the similar > situation, but I have no idea of it for now. would you please elaborate > a bit more detailed reason for this specified situation? In recovery phase, Sheepdog recovers objects from the previous epoch to the current epoch. If the target object is not found in the previous epoch, Sheepdog searches the objects from the two epoch ago. And Sheepdog goes back to the older epoch again until it finds the target objects. In the above situation, when the target objects are not found in epoch 6, 5, and 4, Sheepdog searches the objects from epoch 3. However, the epoch 3 is only stored in [10.68.14.1:7002], so [10.68.14.1:7000] and [10.68.14.1:7001] don't know which node is included in epoch 3. Similarly, [10.68.14.1:7002] doesn't have epoch 4, so all the sheep daemons cannot go back epoch from 4 to 3. I think the solution would be simple; we only have to support getting epoch information from remote. Thanks, Kazutaka |