[sheepdog] [PATCH 09/17] block: Refactor bdrv_has_zero_init{, _truncate}
Max Reitz
mreitz at redhat.com
Wed Feb 5 18:55:48 CET 2020
On 05.02.20 15:07, Eric Blake wrote:
> On 2/5/20 1:51 AM, Vladimir Sementsov-Ogievskiy wrote:
>
>>>>> +typedef enum {
>>>>> + /*
>>>>> + * bdrv_known_zeroes() should include this bit if the contents of
>>>>> + * a freshly-created image with no backing file reads as all
>>>>> + * zeroes without any additional effort. If .bdrv_co_truncate is
>>>>> + * set, then this must be clear if BDRV_ZERO_TRUNCATE is clear.
>>>>
>>>> I understand that this is preexisting logic, but could I ask: why?
>>>> What's wrong
>>>> if driver can guarantee that created file is all-zero, but is not sure
>>>> about
>>>> file resizing? I agree that it's normal for these flags to have the
>>>> same
>>>> value,
>>>> but what is the reason for this restriction?..
>>>
>>> If areas added by truncation (or growth, rather) are always zero, then
>>> the file can always be created with size 0 and grown from there. Thus,
>>> images where truncation adds zeroed areas will generally always be zero
>>> after creation.
>>
>> This means, that if truncation bit is set, than create bit should be
>> set.. But
>> here we say that if truncation is clear, than create bit must be clear.
>
> Max, did we get the logic backwards?
Or maybe my explanation was just wrong.
Because nobody actually forces a driver to use truncate to ensure that
an newly created file will be 0. Hm. And more importantly, you can’t
use truncate with PREALLOC_MODE_OFF when you want to create an image
with preallocation.
Let’s see. The offending commit message says:
> No .bdrv_has_zero_init() implementation returns 1 if growing the file
> would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it
> in lieu of this new function was always safe.
>
> But on the other hand, it is possible that growing an image that is not
> zero-initialized would still add a zero-initialized area, like when
> using nonpreallocating truncation on a preallocated image. For callers
> that care only about truncation, not about creation with potential
> preallocation, this new function is useful.
So I suppose the explanation is just the preallocation mode alone;
has_zero_init() is for the image’s actual preallocation mode, whereas
has_zero_init_truncate() is forced to PREALLOC_MODE_OFF. As such, the
latter is less strict than the former. So the former cannot be true
when the latter is false.
>>>> So, the only possible combination of flags, when they differs, is
>>>> create=0 and
>>>> truncate=1.. How is it possible?
>>>
>>> For preallocated qcow2 images, it depends on the storage whether they
>>> are actually 0 after creation. Hence qcow2_has_zero_init() then defers
>>> to bdrv_has_zero_init() of s->data_file->bs.
>>>
>>> But when you truncate them (with PREALLOC_MODE_OFF, as
>>> BlockDriver.bdrv_has_zero_init_truncate()’s comment explains), the new
>>> area is always going to be 0, regardless of initial preallocation.
>>
>> ah yes, due to qcow2 zero clusters.
>
> Hmm. Do we actually set the zero flag on unallocated clusters when
> resizing a qcow2 image?
No. They are just unallocated, i.e. zero. (Nodes with backing files
never return true for bdrv_has_zero_init_truncate anyway).
> That would be an O(n) operation (we have to
> visit the L2 entry for each added cluster, even if only to set the zero
> cluster bit). Or do we instead just rely on the fact that qcow2 is
> inherently sparse, and that when you resize the guest-visible size
> without writing any new clusters, then it is only subsequent guest
> access to those addresses that finally allocate clusters, making resize
> O(1) (update the qcow2 metadata cluster, but not any L2 tables) while
> still reading 0 from the new data. To some extent, that's what the
> allocation mode is supposed to control.
>
> What about with external data images, where a resize in guest-visible
> length requires a resize of the underlying data image? There, we DO
> have to worry about whether the data image resizes with zeroes (as in
> the filesystem) or with random data (as in a block device).
Well, partially: Namely, only with data_file_raw. Because otherwise the
clusters are still unallocated and thus read as zero. So yes, then we
do have to worry about that.
With data_file_raw, we have an obligation to make the data file return
the same data as the qcow2 file, so, um. I wonder whether we actually
take any care of this yet. If you have some external data file without
zero_init(_truncate), do get zeroes when reading from the qcow2 node,
but non-zeroes when reading from the raw data file? That would be OK
without data_file_raw, but not with it. I suppose I’ll have to test it.
Max
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20200205/fe28d163/attachment-0001.sig>
More information about the sheepdog
mailing list