[sheepdog] [PATCH v3] sheepdog: selectable object size support

Teruaki Ishizaki ishizaki.teruaki at lab.ntt.co.jp
Tue Jan 27 04:57:34 CET 2015


(2015/01/26 19:33), Kevin Wolf wrote:
> Am 26.01.2015 um 10:52 hat Teruaki Ishizaki geschrieben:
>> Hi, Kevin
>>
>> Thanks for your review!
>>
>> (2015/01/23 22:53), Kevin Wolf wrote:
>>> Am 23.01.2015 um 09:24 hat Teruaki Ishizaki geschrieben:
>>>> Previously, qemu block driver of sheepdog used hard-coded VDI object size.
>>>> This patch enables users to handle "block_size_shift" value for
>>>> calculating VDI object size.
>>>>
>>>> When you start qemu, you don't need to specify additional command option.
>>>>
>>>> But when you create the VDI which doesn't have default object size
>>>> with qemu-img command, you specify block_size_shift option.
>>>>
>>>> If you want to create a VDI of 8MB(1 << 23) object size,
>>>> you need to specify following command option.
>>>>
>>>>   # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
>>>>
>>>> In addition, when you don't specify qemu-img command option,
>>>> a default value of sheepdog cluster is used for creating VDI.
>>>>
>>>>   # qemu-img create sheepdog:test2 100M
>>>>
>>>> Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki at lab.ntt.co.jp>
>>>> ---
>>>> V3:
>>>>   - Delete the needless operation of buffer.
>>>>   - Delete the needless operations of request header
>>>>     for SD_OP_GET_CLUSTER_DEFAULT.
>>>>   - Fix coding style problems.
>>>>
>>>> V2:
>>>>   - Fix coding style problem (white space).
>>>>   - Add members, store_policy and block_size_shift to struct SheepdogVdiReq
>>>>   - Initialize request header to use block_size_shift specified by user.
>>>> ---
>>>>   block/sheepdog.c          |  140 ++++++++++++++++++++++++++++++++++++++-------
>>>>   include/block/block_int.h |    1 +
>>>>   2 files changed, 119 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/block/sheepdog.c b/block/sheepdog.c
>>>> index be3176f..c9f06db 100644
>>>> --- a/block/sheepdog.c
>>>> +++ b/block/sheepdog.c
>>>> @@ -37,6 +37,7 @@
>>>>   #define SD_OP_READ_VDIS      0x15
>>>>   #define SD_OP_FLUSH_VDI      0x16
>>>>   #define SD_OP_DEL_VDI        0x17
>>>> +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
>>>>
>>>>   #define SD_FLAG_CMD_WRITE    0x01
>>>>   #define SD_FLAG_CMD_COW      0x02
>>>> @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
>>>>       uint32_t base_vdi_id;
>>>>       uint8_t copies;
>>>>       uint8_t copy_policy;
>>>> -    uint8_t reserved[2];
>>>> +    uint8_t store_policy;
>>>> +    uint8_t block_size_shift;
>>>>       uint32_t snapid;
>>>>       uint32_t type;
>>>>       uint32_t pad[2];
>>>> @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
>>>>       uint32_t pad[5];
>>>>   } SheepdogVdiRsp;
>>>>
>>>> +typedef struct SheepdogClusterRsp {
>>>> +    uint8_t proto_ver;
>>>> +    uint8_t opcode;
>>>> +    uint16_t flags;
>>>> +    uint32_t epoch;
>>>> +    uint32_t id;
>>>> +    uint32_t data_length;
>>>> +    uint32_t result;
>>>> +    uint8_t nr_copies;
>>>> +    uint8_t copy_policy;
>>>> +    uint8_t block_size_shift;
>>>> +    uint8_t __pad1;
>>>> +    uint32_t __pad2[6];
>>>> +} SheepdogClusterRsp;
>>>> +
>>>>   typedef struct SheepdogInode {
>>>>       char name[SD_MAX_VDI_LEN];
>>>>       char tag[SD_MAX_VDI_TAG_LEN];
>>>> @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
>>>>       hdr.vdi_size = s->inode.vdi_size;
>>>>       hdr.copy_policy = s->inode.copy_policy;
>>>>       hdr.copies = s->inode.nr_copies;
>>>> +    hdr.block_size_shift = s->inode.block_size_shift;
>>>>
>>>>       ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
>>>>
>>>> @@ -1569,9 +1587,11 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
>>>>   static int sd_prealloc(const char *filename, Error **errp)
>>>>   {
>>>>       BlockDriverState *bs = NULL;
>>>> +    BDRVSheepdogState *base = NULL;
>>>>       uint32_t idx, max_idx;
>>>> +    uint32_t object_size;
>>>>       int64_t vdi_size;
>>>> -    void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
>>>> +    void *buf = NULL;
>>>>       int ret;
>>>>
>>>>       ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
>>>> @@ -1585,18 +1605,23 @@ static int sd_prealloc(const char *filename, Error **errp)
>>>>           ret = vdi_size;
>>>>           goto out;
>>>>       }
>>>> -    max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
>>>> +
>>>> +    base = bs->opaque;
>>>> +    object_size = (UINT32_C(1) << base->inode.block_size_shift);
>>>> +    buf = g_malloc0(object_size);
>>>
>>> If I understand correctly, block_size_shift can be up to 31, i.e. this
>>> is a 2 GB allocation. Do you really think this is a good idea?
>> I think that it is not best idea.
>> Sheepdog internal limit of block_size_shift is 31, so I set logical
>> limit of block_size_shift for 31.
>
> I think there is no real requirement to use the object size here, this
> is only the buffer size for some bdrv_read/write calls, which works fine
> in smaller chunks.
>
> So you could still allow block_size_shift up to 31, but use a different
> number here. Perhaps always leaving it at 4 MB is good enough, otherwise
> you could use something like MIN(24, base->inode.block_size_shift).
>
>>> At least use g_try_malloc0() here, so that a memory allocation failure
>>> doesn't crash qemu. (Same goes for all potentially huge allocations that
>>> you make in the whole codebase.)
>> As you pointed out, memory allocation was needed for that value.
>> I suppose that block_size_shift should be up to 26(64MB) or 27(128M)
>> actually.
>>
>> Should it be checked actual limits in that source code?
>
> I suggest limiting the maximum buffer size as above so that a failure
> becomes unlikely, and using g_try_malloc() to handle any allocation
> failure that occurs anyway gracefully.
OK, I'll change bdrv_read/write buffer size up to 4MB.

Thanks,
Teruaki Ishizaki





More information about the sheepdog mailing list