[sheepdog-users] Help? Creeping Errors "no inode has ..." with 0.9.1

Mon Jan 26 16:11:29 CET 2015

I've been getting increasing errors in my logs that "failed No object
found, remote address: XXXXXXX:7000, op name: READ_PEER" and then
corresponding errors that "no inode has ...." when I do a cluster check.

At the beginning of last week I had no errors, and over the course of a
week it grew to be one VDI missing some hundred inodes, and now it is
multiple VDIs each missing hundreds of objects.

I haven't seen any issues with the underlying hardware, disks, or
zookeeper on the nodes in the course of the same time.

What is causing this data loss? How can I debug it? How can I stem it?
Any chances I can repair the missing inodes?

I have 5 sheepdog storage nodes, also running Zookeeper. I have another
8 "gateway only" nodes that are part of the node pool, but only running
a gateway and cache.

I have about  dozen VDI images, and they've been fairly static for the
last week while I've been testing -- not a lot of write activity.

~ thornton