[sheepdog] [PATCH v4] sheepdog: selectable object size support

Teruaki Ishizaki ishizaki.teruaki at lab.ntt.co.jp
Fri Feb 13 02:33:04 CET 2015

(2015/02/12 11:55), Liu Yuan wrote:
> On Thu, Feb 12, 2015 at 11:33:16AM +0900, Teruaki Ishizaki wrote:
>> (2015/02/12 11:19), Liu Yuan wrote:
>>> On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote:
>>>> (2015/02/10 20:12), Liu Yuan wrote:
>>>>> On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
>>>>>> Previously, qemu block driver of sheepdog used hard-coded VDI object size.
>>>>>> This patch enables users to handle "block_size_shift" value for
>>>>>> calculating VDI object size.
>>>>>> When you start qemu, you don't need to specify additional command option.
>>>>>> But when you create the VDI which doesn't have default object size
>>>>>> with qemu-img command, you specify block_size_shift option.
>>>>>> If you want to create a VDI of 8MB(1 << 23) object size,
>>>>>> you need to specify following command option.
>>>>>>   # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
>>>>>> In addition, when you don't specify qemu-img command option,
>>>>>> a default value of sheepdog cluster is used for creating VDI.
>>>>>>   # qemu-img create sheepdog:test2 100M
>>>>>> Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki at lab.ntt.co.jp>
>>>>>> ---
>>>>>> V4:
>>>>>>   - Limit a read/write buffer size for creating a preallocated VDI.
>>>>>>   - Replace a parse function for the block_size_shift option.
>>>>>>   - Fix an error message.
>>>>>> V3:
>>>>>>   - Delete the needless operation of buffer.
>>>>>>   - Delete the needless operations of request header.
>>>>>>   - Fix coding style problems.
>>>>>> V2:
>>>>>>   - Fix coding style problem (white space).
>>>>>>   - Add members, store_policy and block_size_shift to struct SheepdogVdiReq.
>>>>>>   - Initialize request header to use block_size_shift specified by user.
>>>>>> ---
>>>>>>   block/sheepdog.c          |  138 ++++++++++++++++++++++++++++++++++++++-------
>>>>>>   include/block/block_int.h |    1 +
>>>>>>   2 files changed, 119 insertions(+), 20 deletions(-)
>>>>>> diff --git a/block/sheepdog.c b/block/sheepdog.c
>>>>>> index be3176f..a43b947 100644
>>>>>> --- a/block/sheepdog.c
>>>>>> +++ b/block/sheepdog.c
>>>>>> @@ -37,6 +37,7 @@
>>>>>>   #define SD_OP_READ_VDIS      0x15
>>>>>>   #define SD_OP_FLUSH_VDI      0x16
>>>>>>   #define SD_OP_DEL_VDI        0x17
>>>>>> +#define SD_OP_GET_CLUSTER_DEFAULT   0x18
>>>>> This might not be necessary. For old qemu or the qemu-img without setting
>>>>> option, the block_size_shift will be 0.
>>>>> If we make 0 to represent 4MB object, then we don't need to get the default
>>>>> cluster object size.
>>>>> We migth even get rid of the idea of cluster default size. The downsize is that,
>>>>> if we want to create a vdi with different size not the default 4MB,
>>>>> we have to write it every time for qemu-img or dog.
>>>>> If we choose to keep the idea of cluster default size, I think we'd also try to
>>>>> avoid call this request from QEMU to make backward compatibility easier. In this
>>>>> scenario, 0 might be used to ask new sheep to decide to use cluster default size.
>>>>> Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can
>>>>> handle 0 though it has different meanings.
>>>>> Table for this bit as 0:
>>>>> Qe: qemu
>>>>> SD: Sheep daemon
>>>>> CDS: Cluster Default Size
>>>>> Ign: Ignored by the sheep daemon
>>>>> Qe/sd   new    old
>>>>> new     CDS    Ign
>>>>> old     CDS    NULL
>>>> Does Ign mean that VDI is handled as 4MB object size?
>>> Yes, old sheep can only handle 4MB object and doesn't check this field at all.
>>>>> I think this approach is acceptable. The difference to your patch is that
>>>>> we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and
>>>>> SD_OP_GET_CLUSTER_DEFAULT can be removed.
>>>> When users create a new VDI with qemu-img, qemu's Sheepdog backend
>>>> driver calculates max limit VDI size.
>>>> But if block_size_shift option is not specified, qemu's Sheepdog backend
>>>> driver can't calculate max limit VDI size.
>>> If block_size_shift not specified, this means
>>> 1 for old sheep, use 4MB size
>>> 2 for new sheep, use cluster wide default value.
>>> And sheep then can calculate it on its own, no?
>> Dog command(client) calculate max size, so I think
>> that qemu's Sheepdog backend driver should calculate it
>> like dog command.
>> Is that policy changeable?
> I checked the QEMU code and got your idea. In the past it was fixed size so very
> easy to hardcode the check in the client, no communication with sheep needed.
> Yes, if it is reasonable, we can change it.
> I think we can push the size calculation logic into sheep, if not the right size
> return INVALID_PARAMETER to clients. Clients just check this and report error
> back to users.
> There is no backward compability for this approach, since 4MB is the smallest
> size.
> OLD QEMU will limit the max_size as 4TB, which is no problem for new sheep.

I have checked the Qemu and sheepdog code.
When we resize VDI, sd_truncate() is called and
resize value is handled by Qemu.
(Sorry I haven't noticed this operation)

Then, sd_truncate() writes Sheepdog inode object directly.
So Sheepdog server can't handle maximum VDI size.

As I thought, should we use SD_OP_GET_CLUSTER_DEFAULT?
Should maxmimum VDI size be calculated on client program?


More information about the sheepdog mailing list