[Sheepdog] [Qemu-devel] [PATCH] sheepdog: use coroutines
Kevin Wolf
kwolf at redhat.com
Wed Aug 24 14:56:02 CEST 2011
Am 23.08.2011 19:14, schrieb MORITA Kazutaka:
> At Tue, 23 Aug 2011 14:29:50 +0200,
> Kevin Wolf wrote:
>>
>> Am 12.08.2011 14:33, schrieb MORITA Kazutaka:
>>> This makes the sheepdog block driver support bdrv_co_readv/writev
>>> instead of bdrv_aio_readv/writev.
>>>
>>> With this patch, Sheepdog network I/O becomes fully asynchronous. The
>>> block driver yields back when send/recv returns EAGAIN, and is resumed
>>> when the sheepdog network connection is ready for the operation.
>>>
>>> Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>>> ---
>>> block/sheepdog.c | 150 +++++++++++++++++++++++++++++++++--------------------
>>> 1 files changed, 93 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/block/sheepdog.c b/block/sheepdog.c
>>> index e150ac0..e283c34 100644
>>> --- a/block/sheepdog.c
>>> +++ b/block/sheepdog.c
>>> @@ -274,7 +274,7 @@ struct SheepdogAIOCB {
>>> int ret;
>>> enum AIOCBState aiocb_type;
>>>
>>> - QEMUBH *bh;
>>> + Coroutine *coroutine;
>>> void (*aio_done_func)(SheepdogAIOCB *);
>>>
>>> int canceled;
>>> @@ -295,6 +295,10 @@ typedef struct BDRVSheepdogState {
>>> char *port;
>>> int fd;
>>>
>>> + CoMutex lock;
>>> + Coroutine *co_send;
>>> + Coroutine *co_recv;
>>> +
>>> uint32_t aioreq_seq_num;
>>> QLIST_HEAD(outstanding_aio_head, AIOReq) outstanding_aio_head;
>>> } BDRVSheepdogState;
>>> @@ -346,19 +350,16 @@ static const char * sd_strerror(int err)
>>> /*
>>> * Sheepdog I/O handling:
>>> *
>>> - * 1. In the sd_aio_readv/writev, read/write requests are added to the
>>> - * QEMU Bottom Halves.
>>> - *
>>> - * 2. In sd_readv_writev_bh_cb, the callbacks of BHs, we send the I/O
>>> - * requests to the server and link the requests to the
>>> - * outstanding_list in the BDRVSheepdogState. we exits the
>>> - * function without waiting for receiving the response.
>>> + * 1. In sd_co_rw_vector, we send the I/O requests to the server and
>>> + * link the requests to the outstanding_list in the
>>> + * BDRVSheepdogState. The function exits without waiting for
>>> + * receiving the response.
>>> *
>>> - * 3. We receive the response in aio_read_response, the fd handler to
>>> + * 2. We receive the response in aio_read_response, the fd handler to
>>> * the sheepdog connection. If metadata update is needed, we send
>>> * the write request to the vdi object in sd_write_done, the write
>>> - * completion function. The AIOCB callback is not called until all
>>> - * the requests belonging to the AIOCB are finished.
>>> + * completion function. We switch back to sd_co_readv/writev after
>>> + * all the requests belonging to the AIOCB are finished.
>>> */
>>>
>>> static inline AIOReq *alloc_aio_req(BDRVSheepdogState *s, SheepdogAIOCB *acb,
>>> @@ -398,7 +399,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req)
>>> static void sd_finish_aiocb(SheepdogAIOCB *acb)
>>> {
>>> if (!acb->canceled) {
>>> - acb->common.cb(acb->common.opaque, acb->ret);
>>> + qemu_coroutine_enter(acb->coroutine, NULL);
>>> }
>>> qemu_aio_release(acb);
>>> }
>>> @@ -411,7 +412,8 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
>>> * Sheepdog cannot cancel the requests which are already sent to
>>> * the servers, so we just complete the request with -EIO here.
>>> */
>>> - acb->common.cb(acb->common.opaque, -EIO);
>>> + acb->ret = -EIO;
>>> + qemu_coroutine_enter(acb->coroutine, NULL);
>>> acb->canceled = 1;
>>> }
>>>
>>> @@ -435,24 +437,12 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,
>>>
>>> acb->aio_done_func = NULL;
>>> acb->canceled = 0;
>>> - acb->bh = NULL;
>>> + acb->coroutine = qemu_coroutine_self();
>>> acb->ret = 0;
>>> QLIST_INIT(&acb->aioreq_head);
>>> return acb;
>>> }
>>>
>>> -static int sd_schedule_bh(QEMUBHFunc *cb, SheepdogAIOCB *acb)
>>> -{
>>> - if (acb->bh) {
>>> - error_report("bug: %d %d", acb->aiocb_type, acb->aiocb_type);
>>> - return -EIO;
>>> - }
>>> -
>>> - acb->bh = qemu_bh_new(cb, acb);
>>> - qemu_bh_schedule(acb->bh);
>>> - return 0;
>>> -}
>>> -
>>> #ifdef _WIN32
>>>
>>> struct msghdr {
>>> @@ -635,7 +625,13 @@ static int do_readv_writev(int sockfd, struct iovec *iov, int len,
>>> again:
>>> ret = do_send_recv(sockfd, iov, len, iov_offset, write);
>>> if (ret < 0) {
>>> - if (errno == EINTR || errno == EAGAIN) {
>>> + if (errno == EINTR) {
>>> + goto again;
>>> + }
>>> + if (errno == EAGAIN) {
>>> + if (qemu_in_coroutine()) {
>>> + qemu_coroutine_yield();
>>> + }
>>
>> Who reenters the coroutine if we yield here?
>
> The fd handlers, co_write_request() and co_read_response(), will call
> qemu_coroutine_enter() for the coroutine. It will be restarted after
> the sheepdog network connection becomes ready.
Makes sense. Applied to the block branch,
>> And of course for a coroutine based block driver I would like to see
>> much less code in callback functions. But it's a good start anyway.
>
> Yes, actually they can be much simpler. This patch adds asynchrous
> I/O support with a minimal change based on a coroutine, and I think
> code simplification would be the next step.
Ok, if you're looking into this, perfect. :-)
Kevin
More information about the sheepdog
mailing list