[sheepdog] questions about sheepdog write policy

Fri May 27 10:53:58 CEST 2016

2016-05-27 16:14 GMT+08:00 Hitoshi Mitake <mitake.hitoshi at gmail.com>:
>
>
> On Thu, May 26, 2016 at 4:19 PM, Dong Wu <archer.wudong at gmail.com> wrote:
>>
>> Thanks for your reply.
>>
>> 2016-05-26 10:34 GMT+08:00 Hitoshi Mitake <mitake.hitoshi at gmail.com>:
>> >
>> >
>> > On Tue, May 24, 2016 at 6:46 PM, Dong Wu <archer.wudong at gmail.com>
>> > wrote:
>> >>
>> >> hi,mitake
>> >> I have questions about sheepdog write policy.
>> >> for replication, sheepdog write default 3 copies, and is strong
>> >> consistency.
>> >> my doubt is
>> >> 1) if some  replicas write successfully, others fail, then it will
>> >> retry write anyway until all the 3 replicas success? but if there are
>> >> only less than 3 nodes left, will it write only less than 3 replicas
>> >> and return success to client?
>> >
>> >
>> > In a case of disk and network I/O error, sheep returns an error to its
>> > client immediately. In some case (e.g. epoch increasing caused by node
>> > join/leave), it will retry.
>>
>> will the client retry? If the error is caused by only one of the
>> replica(eg, the replica's disk is error), and another two is ok, and
>> writed success,  then return to client error is reasonable? Why not
>> just return to client success, and then recover the errored replica?
>
>
> It is for ensuring 3 replica is consistent. Sheepdog's interface is virtual
> disk so consistency is more important than availability.

sheepdog has no journal to guarantee the replicas's consistency, so it
should just return to client error when any replica write failed and
wait to recover to consistency again and then can continue write.

without any consistent log, sheepdog recover logic will scan all the
replica's objects and check if the object need to recover,  am i
right?

>
>>
>>
>> >
>> >>
>> >> 2) if some replicas write success, others write fail, and return fail
>> >> to client, how to deal with these replicas's data consistency(write
>> >> success node has new data, but write fail node has old data)? if
>> >> client read the same block, will it read new data  or old data?
>> >
>> >
>> > In such a case, we need to repair consistency with "dog vdi check"
>> > command.
>> > Note that in such a case the failed VDIs won't be accessed from VMs
>> > anymore
>> > because they will be used in read-only mode.
>>
>> This meas can't read data from this VDI until it recover done?
>> I remember in old version sheepdog, in the read I/O path, it first
>> check the replicas's consistency, then read data;
>> but i can't find the logic anymore in the lastest version.
>
>
> The data can be read and actually it would work well in many cases.

Consider such a case: a write request write success on replicaA, but
failed on replicaB, then return error to client(so replicaC did not
receive the write req), so replicaA has new data, replicaC has old
data, then client read the VDI, will it always read the new data or
old data, or just sometime read new data, sometime read old data
before "dog vdi check"?

And object on replicaB will be part new and part old? How to guarantee
write object atomic?

> I'm not sure about the feature of the old version, but it seems to be costly
> for ordinal read path. But reviving it as an option would be a reasonable.
> How do you think?

yes, it is costly.

>
> Thanks,
> Hitoshi
>
>>
>>
>> >
>> > Thanks,
>> > Hitoshi
>> >
>> >>
>> >>
>> >> Thanks a lot.
>> >
>> >
>
>

Thanks,
Dong Wu