[Sheepdog] sheep: add transient failure detection

HaiTing Yao yaohaiting.wujue at gmail.com
Thu Mar 15 10:46:46 CET 2012


Accord to review, I made the time that one node can leave then join again as a parameter of cluster format. Thus, every node will have the same time.

I send the full patch again for the convenience of review and test.

Now each node change will lead to epoch changes. We can not restart sheepdog and use the cluster again. The token lost of driver will also lead to the cluster crash. These patch try to fix these problems. If the patch can meet our request, I will update the I/O path of temporarily failed node. With the patch now, visit the temporarily failed node will occur block. I will reuse the object cache mechanism to update the I/O path later. Use the object cache to store dirty objects, and dispatch the dirty objects when node come back.

Thanks
HaiTing





More information about the sheepdog mailing list