[sheepdog] [PATCH 09/17] block: Refactor bdrv_has_zero_init{, _truncate}

Wed Feb 5 15:36:26 CET 2020

On 2/5/20 8:25 AM, Vladimir Sementsov-Ogievskiy wrote:

>>>> But when you truncate them (with PREALLOC_MODE_OFF, as
>>>> BlockDriver.bdrv_has_zero_init_truncate()’s comment explains), the new
>>>> area is always going to be 0, regardless of initial preallocation.
>>>
>>> ah yes, due to qcow2 zero clusters.
>>
>> Hmm. Do we actually set the zero flag on unallocated clusters when 
>> resizing a qcow2 image?  That would be an O(n) operation (we have to 
>> visit the L2 entry for each added cluster, even if only to set the 
>> zero cluster bit).  Or do we instead just rely on the fact that qcow2 
>> is inherently sparse, and that when you resize the guest-visible size 
>> without writing any new clusters, then it is only subsequent guest 
>> access to those addresses that finally allocate clusters, making 
>> resize O(1) (update the qcow2 metadata cluster, but not any L2 tables) 
>> while still reading 0 from the new data.  To some extent, that's what 
>> the allocation mode is supposed to control.
> 
> We must mark as ZERO new cluster at least if there is a _larger_ backing 
> file, to prevent data from backing file become available for the guest. 
> But we don't do it. It's a bug and there is fixing series from Kevin, 
> I've just pinged it:
> "[PATCH for-4.2? v3 0/8] block: Fix resize (extending) of short overlays"

There's a difference for a backing file larger than the qcow2 file, and 
the protocol layer larger than the qcow2 file.  Visually, with the 
following four nodes:

f1 [qcow2 format] <- f2 [qcow2 format]
   v                        v
p1 [file protocol]    p2 [file protocol]

If f1 is larger than f2, then resizing f2 without writing zero clusters 
leaks the data from f1 into f2.  The block layer knows this: prior to 
this series, bdrv_has_zero_init_truncate() returns 0 if bs->backing is 
present; and even in this series, see patch 6/17 which continues to 
force a 0 return rather than calling into the driver if the sizes are 
suspect.  But that is an uncommon corner case; in short, the qcow2 
callback .bdrv_has_zero_init_truncate is NOT reachable in that scenario, 
whether before or after this series.

On the other hand, if p2 is larger than f2, resizing f2 reads zeroes. 
That's because qcow2 HAS to add L2 mappings into p2 before data from p2 
can leak, but .bdrv_co_truncate(PREALLOC_MODE_OFF) does not add any L2 
mappings.  Thus, qcow2 blindly returning 1 for 
.bdrv_has_zero_init_truncate was correct (other than the anomaly of 
bs->encrypted, also fixed earlier in this series).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org