<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 26, 2016 at 4:19 PM, Dong Wu <span dir="ltr"><<a href="mailto:archer.wudong@gmail.com" target="_blank">archer.wudong@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks for your reply.<br>

<span class=""><br>

2016-05-26 10:34 GMT+08:00 Hitoshi Mitake <<a href="mailto:mitake.hitoshi@gmail.com">mitake.hitoshi@gmail.com</a>>:<br>

><br>

><br>

> On Tue, May 24, 2016 at 6:46 PM, Dong Wu <<a href="mailto:archer.wudong@gmail.com">archer.wudong@gmail.com</a>> wrote:<br>

>><br>

>> hi,mitake<br>

>> I have questions about sheepdog write policy.<br>

>> for replication, sheepdog write default 3 copies, and is strong<br>

>> consistency.<br>

>> my doubt is<br>

>> 1) if some  replicas write successfully, others fail, then it will<br>

>> retry write anyway until all the 3 replicas success? but if there are<br>

>> only less than 3 nodes left, will it write only less than 3 replicas<br>

>> and return success to client?<br>

><br>

><br>

> In a case of disk and network I/O error, sheep returns an error to its<br>

> client immediately. In some case (e.g. epoch increasing caused by node<br>

> join/leave), it will retry.<br>

<br>

</span>will the client retry? If the error is caused by only one of the<br>

replica(eg, the replica's disk is error), and another two is ok, and<br>

writed success,  then return to client error is reasonable? Why not<br>

just return to client success, and then recover the errored replica?<br></blockquote><div><br></div><div>It is for ensuring 3 replica is consistent. Sheepdog's interface is virtual disk so consistency is more important than availability.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

><br>

>><br>

>> 2) if some replicas write success, others write fail, and return fail<br>

>> to client, how to deal with these replicas's data consistency(write<br>

>> success node has new data, but write fail node has old data)? if<br>

>> client read the same block, will it read new data  or old data?<br>

><br>

><br>

> In such a case, we need to repair consistency with "dog vdi check" command.<br>

> Note that in such a case the failed VDIs won't be accessed from VMs anymore<br>

> because they will be used in read-only mode.<br>

<br>

</span>This meas can't read data from this VDI until it recover done?<br>

I remember in old version sheepdog, in the read I/O path, it first<br>

check the replicas's consistency, then read data;<br>

but i can't find the logic anymore in the lastest version.<br></blockquote><div><br></div><div>The data can be read and actually it would work well in many cases.</div><div><br></div><div>I'm not sure about the feature of the old version, but it seems to be costly for ordinal read path. But reviving it as an option would be a reasonable. How do you think?</div><div><br></div><div>Thanks,</div><div>Hitoshi</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

><br>

> Thanks,<br>

> Hitoshi<br>

><br>

>><br>

>><br>

>> Thanks a lot.<br>

><br>

><br>

</div></div></blockquote></div><br></div></div>