[sheepdog-users] Help? Creeping Errors "no inode has ..." with 0.9.1

Tue Jan 27 16:49:39 CET 2015

Thanks. I have been using cache -- so if that is unstable that would
explain a lot. I'm disabling cache to see how much that helps.

Attached is a dog cluster info. I have a few MB of logs ... I'll see
where I can post them to get the

I am seeing a strong correlation between snapshots and the corrupted
VDIs. All the VDIs that have missing inodes are part of a daily snapshot
schedule. All the VDIs that are not part of the snapshot schedule are
fine. All the nodes have object cache enabled.

Thanks ... I'll see if I can collect more data and reproduce the problem
more consistently.

~ thornton prime

> Hitoshi Mitake <mailto:mitake.hitoshi at lab.ntt.co.jp>
> January 26, 2015 at 8:17 PM
> At Mon, 26 Jan 2015 07:11:29 -0800,
> Thornton Prime wrote:
>> I've been getting increasing errors in my logs that "failed No object
>> found, remote address: XXXXXXX:7000, op name: READ_PEER" and then
>> corresponding errors that "no inode has ...." when I do a cluster check.
>
> Could you provide detailed logs and an output of "dog cluster info"?
>
>> At the beginning of last week I had no errors, and over the course of a
>> week it grew to be one VDI missing some hundred inodes, and now it is
>> multiple VDIs each missing hundreds of objects.
>>
>> I haven't seen any issues with the underlying hardware, disks, or
>> zookeeper on the nodes in the course of the same time.
>>
>> What is causing this data loss? How can I debug it? How can I stem it?
>> Any chances I can repair the missing inodes?
>>
>> I have 5 sheepdog storage nodes, also running Zookeeper. I have another
>> 8 "gateway only" nodes that are part of the node pool, but only running
>> a gateway and cache.
>
> Object cache (a functionality which can be activated with -w option of
> sheep) is quite unstable. Please do not use it for serious purpose.
>
> Thanks,
> Hitoshi
> Thornton Prime <mailto:thornton.prime at gmail.com>
> January 26, 2015 at 7:11 AM
> I've been getting increasing errors in my logs that "failed No object
> found, remote address: XXXXXXX:7000, op name: READ_PEER" and then
> corresponding errors that "no inode has ...." when I do a cluster check.
>
> At the beginning of last week I had no errors, and over the course of a
> week it grew to be one VDI missing some hundred inodes, and now it is
> multiple VDIs each missing hundreds of objects.
>
> I haven't seen any issues with the underlying hardware, disks, or
> zookeeper on the nodes in the course of the same time.
>
> What is causing this data loss? How can I debug it? How can I stem it?
> Any chances I can repair the missing inodes?
>
> I have 5 sheepdog storage nodes, also running Zookeeper. I have another
> 8 "gateway only" nodes that are part of the node pool, but only running
> a gateway and cache.
>
> I have about dozen VDI images, and they've been fairly static for the
> last week while I've been testing -- not a lot of write activity.
>
> ~ thornton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150127/b42aa6f6/attachment-0005.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150127/b42aa6f6/attachment-0010.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1300 bytes
Desc: not available
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150127/b42aa6f6/attachment-0011.jpg>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster_info.txt
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20150127/b42aa6f6/attachment-0005.txt>