[sheepdog] [PATCH v3 09/17] block/io: support int64_t bytes in bdrv_co_p{read, write}v_part()
Eric Blake
eblake at redhat.com
Fri May 22 21:34:23 CEST 2020
On 4/30/20 6:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
>
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
>
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
>
> So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
> remaining dependencies now.
>
> Series: 64bit-block-status
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
> ---
> include/block/block_int.h | 4 ++--
> block/io.c | 16 ++++++++--------
> block/trace-events | 4 ++--
> 3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c8daba608b..3c2a1d741a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -975,13 +975,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
> int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
> BdrvRequestFlags flags);
> int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes,
> + int64_t offset, int64_t bytes,
> QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
> int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
> int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
> BdrvRequestFlags flags);
> int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes,
> + int64_t offset, int64_t bytes,
> QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
Callers for these two functions:
block-backend.c:blk_do_pwritev_part() - currently passes unsigned int
filter-compress.c:compress_co_preadv_part() - passes uint64_t from
.bdrv_co_preadv_part, which in turn is called from:
- io.c:bdrv_driver_preadv() - callers analyzed earlier this series,
where we know we are currently capped at <2G
- qcow2-cluster.c:do_perform_cow_read() - passes size_t qiov->size,
but we further know qcow2 cow is limited to cluster size of 2M
- qcow2.c:qcow2_load_vmstate() - passes size_t qiov->size, tracing
whether that ever exceeds 32 bits (on 64-bit platforms) is harder
filter-compress.c:compress_co_pwritev_part() - ditto, but for
.bdrv_co_pwritev_part, which in turn is called from:
- io.c:bdrv_driver_pwritev() - callers analyzed earlier this series,
where we know we are currently capped at <2G
- qcow2.c:qcow2_save_vmstate() - passes size_t qiov->size, tracing
whether that ever exceeds 32 bits (on 64-bit platforms) is harder
io.c:bdrv_co_preadv() - currently passes unsigned int
io.c:bdrv_co_pwritev() - currently passes unsigned int
qcow2.c:qcow2_co_preadv_task() - passes uint64_t from
qcow2_co_preadv_part(), which in turn is called from:
- .bdrv_co_preadv_part - analyzed above
qcow2.c:qcow2_co_pwritev_task() - passes uint64_t from
qcow2_co_pwritev_part(), which in turn is called from:
- .bdrv_co_pwritev_part - analyzed above
- qcow2_co_truncate() - passes uint64_t but it is clamped to 1
cluster, at most 2M
In summary, it looks like even with our new 64-bit bytes parameter, most
(all?) callers are still clamped to 32 bits. But if we later relax
callers, we want to see how bytes is used within these functions.
>
> static inline int coroutine_fn bdrv_co_pread(BdrvChild *child,
> diff --git a/block/io.c b/block/io.c
> index d336e4e691..d7fd429345 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1488,7 +1488,7 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
> */
> static bool bdrv_pad_request(BlockDriverState *bs,
> QEMUIOVector **qiov, size_t *qiov_offset,
> - int64_t *offset, unsigned int *bytes,
> + int64_t *offset, int64_t *bytes,
> BdrvRequestPadding *pad)
> {
> if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
Callers:
bdrv_do_preadv_part() - adjusted to int64_t in this patch
bdrv_do_pwritev_part() - adjusted to int64_t in this patch
Usage:
if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
- takes int64_t, but now has to be checked for 64-bit safety below
qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
*qiov, *qiov_offset, *bytes,
pad->buf + pad->buf_len - pad->tail,
pad->tail);
- takes size_t, risky on 32-bit platforms if any of our callers ever
pass in a value larger than 32 bits. I'd feel much better with an
assertion that bytes <= SIZE_MAX.
*bytes += pad->head + pad->tail;
- corner-case risk of overflow for an image near 63-bit limits (nbdkit
can generate such an image, but real images do not tickle this); the
risk can be mitigated if we insist that no images are larger than
QEMU_ALIGN_DOWN(INT64_MAX, request_alignment), as we would be unable to
access the unaligned tail bytes of such an image
bdrv_init_padding():
sum = pad->head + bytes + pad->tail;
- ouch: this sets 'size_t sum' to what has always been an int64_t
parameter, but which prior to this patch was initialized by callers
passing 32-bit values (with this patch, callers now pass in actual
int64_t). But even pre-patch, there are values of bytes that fit in 32
bits but where this sum can overflow on 32-bit platforms, with that
potential overflow not changed here. We need a separate patch to fix
this (sum needs to be int64_t), preferably earlier in the series. (patch
6/17 also mentioned bdrv_init_padding as needing a fix)
> @@ -1515,7 +1515,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
>
> /* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
> static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes,
> + int64_t offset, int64_t bytes,
> QEMUIOVector *qiov, size_t qiov_offset,
> BdrvRequestFlags flags)
> {
Callers:
bdrv_co_preadv_part() - adjusted this patch
bdrv_rw_co_entry() called by bdrv_prwv_co called by:
- bdrv_pwrite_zeroes() - int bytes
- bdrv_pwritev() - size_t bytes from qiov
- bdrv_preadv() - size_t bytes from qiov
This fixes a latent issue where pre-patch callers could pass size_t and
suffer truncation on a 64-bit platform; but in practice we never tickled
that bug because of our insistence on <2G read/write.
Usage:
> @@ -1524,7 +1524,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
> BdrvRequestPadding pad;
> int ret;
>
> - trace_bdrv_co_preadv(bs, offset, bytes, flags);
> + trace_bdrv_co_preadv_part(bs, offset, bytes, flags);
>
> ret = bdrv_check_byte_request(bs, offset, bytes);
takes int64_t (patch 3/17 in this series), and ensures the transaction
is <2G for the rest of the function. If we ever relax
bdrv_check_byte_request, we also have to worry about:
tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
- takes int64_t, audited earlier in this series
ret = bdrv_aligned_preadv(child, &req, offset, bytes,
bs->bl.request_alignment,
qiov, qiov_offset, flags);
- takes int64_t, audited earlier in this series
> if (ret < 0) {
> @@ -1562,7 +1562,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
> }
>
> int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes,
> + int64_t offset, int64_t bytes,
> QEMUIOVector *qiov, size_t qiov_offset,
> BdrvRequestFlags flags)
> {
Callers analyzed above, usage within the function:
ret = bdrv_do_preadv_part(child, offset, bytes, qiov, qiov_offset,
flags);
- analyzed above
> @@ -1866,7 +1866,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>
> static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
> int64_t offset,
> - unsigned int bytes,
> + int64_t bytes,
> BdrvRequestFlags flags,
> BdrvTrackedRequest *req)
> {
Caller: bdrv_do_pwritev_part(), adjusted in this patch
Usage:
padding = bdrv_init_padding(bs, offset, bytes, &pad);
- analyzed above
ret = bdrv_aligned_pwritev(child, req, offset, aligned_bytes,
align,
NULL, 0, flags);
- takes int64_t, audited earlier in the series
assert(align == pad.tail + bytes);
- changes from 'uint64_t == size_t + unsigned int' to 'uint64_t ==
size_t + int64_t', but does not change in semantics
> @@ -1941,7 +1941,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>
> /* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
> static int coroutine_fn bdrv_do_pwritev_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
> + int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
> BdrvRequestFlags flags)
> {
> BlockDriverState *bs = child->bs;
Callers:
bdrv_co_pwritev_part() - adjusted this patch
bdrv_do_pwrite_zeroes() - passes 'int'
bdrv_rw_co_entry() analyzed above
Usage:
> @@ -1950,7 +1950,7 @@ static int coroutine_fn bdrv_do_pwritev_part(BdrvChild *child,
> BdrvRequestPadding pad;
> int ret;
>
> - trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
> + trace_bdrv_co_pwritev_part(child->bs, offset, bytes, flags);
>
> if (!bs->drv) {
> return -ENOMEDIUM;
ret = bdrv_check_byte_request(bs, offset, bytes);
- analyzed above; if we ever relax it, we must also worry about:
tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
- analyzed above
ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, &req);
- analyzed above
if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
- analyzed above
ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
qiov, qiov_offset, flags);
- takes int64_t, analyzed earlier this series
> @@ -2009,7 +2009,7 @@ out:
> }
>
> int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
> - int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
> + int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
> BdrvRequestFlags flags)
> {
> int ret;
Callers analzyed above, usage:
ret = bdrv_do_pwritev_part(child, offset, bytes, qiov, qiov_offset,
flags);
- analyzed above
> diff --git a/block/trace-events b/block/trace-events
> index 179b47bf63..dd367a9b19 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -11,8 +11,8 @@ blk_root_attach(void *child, void *blk, void *bs) "child %p blk %p bs %p"
> blk_root_detach(void *child, void *blk, void *bs) "child %p blk %p bs %p"
>
> # io.c
> -bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
> -bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x"
> +bdrv_co_preadv_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 " bytes %" PRId64 " flags 0x%x"
> +bdrv_co_pwritev_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 " bytes %" PRId64 " flags 0x%x"
Interesting that the trace always took 64-bit numbers, but you are
renaming it to match the fact that they are now called from different
functions (since commit 1acc3466a2). I'd feel better if 'flags' were
kept unsigned int, even though semantically it doesn't matter.
> bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x"
> bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %" PRId64 " bytes %" PRId64 " cluster_offset %" PRId64 " cluster_bytes %" PRId64
> bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
>
I think if you fix bdrv_init_padding in a separate patch, then
everything else here is safe.
Reviewed-by: Eric Blake <eblake at redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
More information about the sheepdog
mailing list