<html><head>

<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">

</head><body bgcolor="#FFFFFF" text="#000000">Thanks. I have been using 

cache -- so if that is unstable that would explain a lot. I'm disabling 

cache to see how much that helps.<br>

<br>

Attached is a dog cluster info. I have a few MB of logs ... I'll see 

where I can post them to get the<br>

<br>

I am seeing a strong correlation between snapshots and the corrupted 

VDIs. All the VDIs that have missing inodes are part of a daily snapshot

 schedule. All the VDIs that are not part of the snapshot schedule are 

fine. All the nodes have object cache enabled.<br>

<br>

Thanks ... I'll see if I can collect more data and reproduce the problem

 more consistently.<br>

<br>

~ thornton prime<br>

<br>

<blockquote style="border: 0px none;" 

cite="mid:87y4oo3lis.wl%25mitake.hitoshi@lab.ntt.co.jp" type="cite">

  <div style="margin:30px 25px 10px 25px;" class="__pbConvHr"><div 

style="display:table;width:100%;border-top:1px solid 

#EDEEF0;padding-top:5px">       <div 

style="display:table-cell;vertical-align:middle;padding-right:6px;"><img

 photoaddress="mitake.hitoshi@lab.ntt.co.jp" photoname="Hitoshi Mitake" 

src="cid:part1.00000307.09020108@gmail.com" 

name="compose-unknown-contact.jpg" height="25px" width="25px"></div>   <div

style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">

        <a moz-do-not-send="true" href="mailto:mitake.hitoshi@lab.ntt.co.jp"

 style="color:#737F92 

!important;padding-right:6px;font-weight:bold;text-decoration:none 

!important;">Hitoshi Mitake</a></div>   <div 

style="display:table-cell;white-space:nowrap;vertical-align:middle;">   

  <font color="#9FA2A5"><span style="padding-left:6px">January 26, 2015 

at 8:17 PM</span></font></div></div></div>

  <div style="color:#888888;margin-left:24px;margin-right:24px;" 

__pbrmquotes="true" class="__pbConvBody"><pre wrap="">At Mon, 26 Jan 2015 07:11:29 -0800,

Thornton Prime wrote:

</pre><blockquote type="cite"><pre wrap="">I've been getting increasing errors in my logs that "failed No object

found, remote address: XXXXXXX:7000, op name: READ_PEER" and then

corresponding errors that "no inode has ...." when I do a cluster check.

</pre></blockquote><pre wrap=""><!---->

Could you provide detailed logs and an output of "dog cluster info"?

</pre><blockquote type="cite"><pre wrap="">At the beginning of last week I had no errors, and over the course of a

week it grew to be one VDI missing some hundred inodes, and now it is

multiple VDIs each missing hundreds of objects.

I haven't seen any issues with the underlying hardware, disks, or

zookeeper on the nodes in the course of the same time.

What is causing this data loss? How can I debug it? How can I stem it?

Any chances I can repair the missing inodes?

I have 5 sheepdog storage nodes, also running Zookeeper. I have another

8 "gateway only" nodes that are part of the node pool, but only running

a gateway and cache.

</pre></blockquote><pre wrap=""><!---->

Object cache (a functionality which can be activated with -w option of

sheep) is quite unstable. Please do not use it for serious purpose.

Thanks,

Hitoshi

</pre></div>

  <div style="margin:30px 25px 10px 25px;" class="__pbConvHr"><div 

style="display:table;width:100%;border-top:1px solid 

#EDEEF0;padding-top:5px">       <div 

style="display:table-cell;vertical-align:middle;padding-right:6px;"><img

 photoaddress="thornton.prime@gmail.com" photoname="Thornton Prime" 

src="cid:part2.00020502.01020803@gmail.com" name="postbox-contact.jpg" 

height="25px" width="25px"></div>   <div 

style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">

        <a moz-do-not-send="true" href="mailto:thornton.prime@gmail.com" 

style="color:#737F92 

!important;padding-right:6px;font-weight:bold;text-decoration:none 

!important;">Thornton Prime</a></div>   <div 

style="display:table-cell;white-space:nowrap;vertical-align:middle;">   

  <font color="#9FA2A5"><span style="padding-left:6px">January 26, 2015 

at 7:11 AM</span></font></div></div></div>

  <div style="color:#888888;margin-left:24px;margin-right:24px;" 

__pbrmquotes="true" class="__pbConvBody"><div>I've been getting 

increasing errors in my logs that "failed No object<br>found, remote 

address: XXXXXXX:7000, op name: READ_PEER" and then<br>corresponding 

errors that "no inode has ...." when I do a cluster check.<br><br>At the

 beginning of last week I had no errors, and over the course of a<br>week

 it grew to be one VDI missing some hundred inodes, and now it is<br>multiple

 VDIs each missing hundreds of objects.<br><br>I haven't seen any issues

 with the underlying hardware, disks, or<br>zookeeper on the nodes in 

the course of the same time.<br><br>What is causing this data loss? How 

can I debug it? How can I stem it?<br>Any chances I can repair the 

missing inodes?<br><br>I have 5 sheepdog storage nodes, also running 

Zookeeper. I have another<br>8 "gateway only" nodes that are part of the

 node pool, but only running<br>a gateway and cache.<br><br>I have about

  dozen VDI images, and they've been fairly static for the<br>last week 

while I've been testing -- not a lot of write activity.<br><br>~ 

thornton<br></div></div>

</blockquote>

</body></html>