[sheepdog] [PATCH v3 09/17] block/io: support int64_t bytes in bdrv_co_p{read, write}v_part()

Eric Blake eblake at redhat.com
Fri May 22 21:34:23 CEST 2020


On 4/30/20 6:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
> remaining dependencies now.
> 
> Series: 64bit-block-status
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
> ---
>   include/block/block_int.h |  4 ++--
>   block/io.c                | 16 ++++++++--------
>   block/trace-events        |  4 ++--
>   3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c8daba608b..3c2a1d741a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -975,13 +975,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
>       int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
>       BdrvRequestFlags flags);
>   int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes,
> +    int64_t offset, int64_t bytes,
>       QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
>   int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>       int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
>       BdrvRequestFlags flags);
>   int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes,
> +    int64_t offset, int64_t bytes,
>       QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);

Callers for these two functions:

block-backend.c:blk_do_pwritev_part() - currently passes unsigned int

filter-compress.c:compress_co_preadv_part() - passes uint64_t from 
.bdrv_co_preadv_part, which in turn is called from:
  - io.c:bdrv_driver_preadv() - callers analyzed earlier this series, 
where we know we are currently capped at <2G
  - qcow2-cluster.c:do_perform_cow_read() - passes size_t qiov->size, 
but we further know qcow2 cow is limited to cluster size of 2M
  - qcow2.c:qcow2_load_vmstate() - passes size_t qiov->size, tracing 
whether that ever exceeds 32 bits (on 64-bit platforms) is harder

filter-compress.c:compress_co_pwritev_part() - ditto, but for 
.bdrv_co_pwritev_part, which in turn is called from:
  - io.c:bdrv_driver_pwritev() - callers analyzed earlier this series, 
where we know we are currently capped at <2G
  - qcow2.c:qcow2_save_vmstate() - passes size_t qiov->size, tracing 
whether that ever exceeds 32 bits (on 64-bit platforms) is harder

io.c:bdrv_co_preadv() - currently passes unsigned int

io.c:bdrv_co_pwritev() - currently passes unsigned int

qcow2.c:qcow2_co_preadv_task() - passes uint64_t from 
qcow2_co_preadv_part(), which in turn is called from:
  - .bdrv_co_preadv_part - analyzed above

qcow2.c:qcow2_co_pwritev_task() - passes uint64_t from 
qcow2_co_pwritev_part(), which in turn is called from:
  - .bdrv_co_pwritev_part - analyzed above
  - qcow2_co_truncate() - passes uint64_t but it is clamped to 1 
cluster, at most 2M

In summary, it looks like even with our new 64-bit bytes parameter, most 
(all?) callers are still clamped to 32 bits.  But if we later relax 
callers, we want to see how bytes is used within these functions.

>   
>   static inline int coroutine_fn bdrv_co_pread(BdrvChild *child,
> diff --git a/block/io.c b/block/io.c
> index d336e4e691..d7fd429345 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1488,7 +1488,7 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
>    */
>   static bool bdrv_pad_request(BlockDriverState *bs,
>                                QEMUIOVector **qiov, size_t *qiov_offset,
> -                             int64_t *offset, unsigned int *bytes,
> +                             int64_t *offset, int64_t *bytes,
>                                BdrvRequestPadding *pad)
>   {
>       if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {

Callers:
bdrv_do_preadv_part() - adjusted to int64_t in this patch
bdrv_do_pwritev_part() - adjusted to int64_t in this patch

Usage:
     if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
  - takes int64_t, but now has to be checked for 64-bit safety below

     qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
                              *qiov, *qiov_offset, *bytes,
                              pad->buf + pad->buf_len - pad->tail, 
pad->tail);
  - takes size_t, risky on 32-bit platforms if any of our callers ever 
pass in a value larger than 32 bits.  I'd feel much better with an 
assertion that bytes <= SIZE_MAX.

     *bytes += pad->head + pad->tail;
  - corner-case risk of overflow for an image near 63-bit limits (nbdkit 
can generate such an image, but real images do not tickle this); the 
risk can be mitigated if we insist that no images are larger than 
QEMU_ALIGN_DOWN(INT64_MAX, request_alignment), as we would be unable to 
access the unaligned tail bytes of such an image


bdrv_init_padding():
     sum = pad->head + bytes + pad->tail;
  - ouch: this sets 'size_t sum' to what has always been an int64_t 
parameter, but which prior to this patch was initialized by callers 
passing 32-bit values (with this patch, callers now pass in actual 
int64_t).  But even pre-patch, there are values of bytes that fit in 32 
bits but where this sum can overflow on 32-bit platforms, with that 
potential overflow not changed here.  We need a separate patch to fix 
this (sum needs to be int64_t), preferably earlier in the series. (patch 
6/17 also mentioned bdrv_init_padding as needing a fix)


> @@ -1515,7 +1515,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
>   
>   /* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
>   static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes,
> +    int64_t offset, int64_t bytes,
>       QEMUIOVector *qiov, size_t qiov_offset,
>       BdrvRequestFlags flags)
>   {

Callers:
bdrv_co_preadv_part() - adjusted this patch
bdrv_rw_co_entry() called by bdrv_prwv_co called by:
  - bdrv_pwrite_zeroes() - int bytes
  - bdrv_pwritev() - size_t bytes from qiov
  - bdrv_preadv() - size_t bytes from qiov

This fixes a latent issue where pre-patch callers could pass size_t and 
suffer truncation on a 64-bit platform; but in practice we never tickled 
that bug because of our insistence on <2G read/write.

Usage:

> @@ -1524,7 +1524,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
>       BdrvRequestPadding pad;
>       int ret;
>   
> -    trace_bdrv_co_preadv(bs, offset, bytes, flags);
> +    trace_bdrv_co_preadv_part(bs, offset, bytes, flags);
>   
>       ret = bdrv_check_byte_request(bs, offset, bytes);

takes int64_t (patch 3/17 in this series), and ensures the transaction 
is <2G for the rest of the function. If we ever relax 
bdrv_check_byte_request, we also have to worry about:

     tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
  - takes int64_t, audited earlier in this series

     ret = bdrv_aligned_preadv(child, &req, offset, bytes,
                               bs->bl.request_alignment,
                               qiov, qiov_offset, flags);
  - takes int64_t, audited earlier in this series


>       if (ret < 0) {
> @@ -1562,7 +1562,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
>   }
>   
>   int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes,
> +    int64_t offset, int64_t bytes,
>       QEMUIOVector *qiov, size_t qiov_offset,
>       BdrvRequestFlags flags)
>   {

Callers analyzed above, usage within the function:

     ret = bdrv_do_preadv_part(child, offset, bytes, qiov, qiov_offset, 
flags);
  - analyzed above

> @@ -1866,7 +1866,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>   
>   static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
>                                                   int64_t offset,
> -                                                unsigned int bytes,
> +                                                int64_t bytes,
>                                                   BdrvRequestFlags flags,
>                                                   BdrvTrackedRequest *req)
>   {

Caller: bdrv_do_pwritev_part(), adjusted in this patch

Usage:
     padding = bdrv_init_padding(bs, offset, bytes, &pad);
  - analyzed above

         ret = bdrv_aligned_pwritev(child, req, offset, aligned_bytes, 
align,
                                    NULL, 0, flags);
  - takes int64_t, audited earlier in the series
         assert(align == pad.tail + bytes);
  - changes from 'uint64_t == size_t + unsigned int' to 'uint64_t == 
size_t + int64_t', but does not change in semantics


> @@ -1941,7 +1941,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>   
>   /* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
>   static int coroutine_fn bdrv_do_pwritev_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
> +    int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
>       BdrvRequestFlags flags)
>   {
>       BlockDriverState *bs = child->bs;

Callers:
bdrv_co_pwritev_part() - adjusted this patch
bdrv_do_pwrite_zeroes() - passes 'int'
bdrv_rw_co_entry() analyzed above

Usage:

> @@ -1950,7 +1950,7 @@ static int coroutine_fn bdrv_do_pwritev_part(BdrvChild *child,
>       BdrvRequestPadding pad;
>       int ret;
>   
> -    trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
> +    trace_bdrv_co_pwritev_part(child->bs, offset, bytes, flags);
>   
>       if (!bs->drv) {
>           return -ENOMEDIUM;

     ret = bdrv_check_byte_request(bs, offset, bytes);
  - analyzed above; if we ever relax it, we must also worry about:

     tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
  - analyzed above

         ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, &req);
  - analyzed above

     if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
  - analyzed above

     ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                qiov, qiov_offset, flags);
  - takes int64_t, analyzed earlier this series


> @@ -2009,7 +2009,7 @@ out:
>   }
>   
>   int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
> -    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
> +    int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
>       BdrvRequestFlags flags)
>   {
>       int ret;

Callers analzyed above, usage:

     ret = bdrv_do_pwritev_part(child, offset, bytes, qiov, qiov_offset, 
flags);
  - analyzed above

> diff --git a/block/trace-events b/block/trace-events
> index 179b47bf63..dd367a9b19 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -11,8 +11,8 @@ blk_root_attach(void *child, void *blk, void *bs) "child %p blk %p bs %p"
>   blk_root_detach(void *child, void *blk, void *bs) "child %p blk %p bs %p"
>   
>   # io.c
> -bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
> -bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
> +bdrv_co_preadv_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 " bytes %" PRId64 " flags 0x%x"
> +bdrv_co_pwritev_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 " bytes %" PRId64 " flags 0x%x"

Interesting that the trace always took 64-bit numbers, but you are 
renaming it to match the fact that they are now called from different 
functions (since commit 1acc3466a2).  I'd feel better if 'flags' were 
kept unsigned int, even though semantically it doesn't matter.

>   bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
>   bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %" PRId64 " bytes %" PRId64 " cluster_offset %" PRId64 " cluster_bytes %" PRId64
>   bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
> 

I think if you fix bdrv_init_padding in a separate patch, then 
everything else here is safe.
Reviewed-by: Eric Blake <eblake at redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



More information about the sheepdog mailing list