[Sheepdog] [Qemu-devel] [PATCH] sheepdog: use coroutines

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Aug 23 19:14:45 CEST 2011


At Tue, 23 Aug 2011 14:29:50 +0200,
Kevin Wolf wrote:
> 
> Am 12.08.2011 14:33, schrieb MORITA Kazutaka:
> > This makes the sheepdog block driver support bdrv_co_readv/writev
> > instead of bdrv_aio_readv/writev.
> > 
> > With this patch, Sheepdog network I/O becomes fully asynchronous.  The
> > block driver yields back when send/recv returns EAGAIN, and is resumed
> > when the sheepdog network connection is ready for the operation.
> > 
> > Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> > ---
> >  block/sheepdog.c |  150 +++++++++++++++++++++++++++++++++--------------------
> >  1 files changed, 93 insertions(+), 57 deletions(-)
> > 
> > diff --git a/block/sheepdog.c b/block/sheepdog.c
> > index e150ac0..e283c34 100644
> > --- a/block/sheepdog.c
> > +++ b/block/sheepdog.c
> > @@ -274,7 +274,7 @@ struct SheepdogAIOCB {
> >      int ret;
> >      enum AIOCBState aiocb_type;
> >  
> > -    QEMUBH *bh;
> > +    Coroutine *coroutine;
> >      void (*aio_done_func)(SheepdogAIOCB *);
> >  
> >      int canceled;
> > @@ -295,6 +295,10 @@ typedef struct BDRVSheepdogState {
> >      char *port;
> >      int fd;
> >  
> > +    CoMutex lock;
> > +    Coroutine *co_send;
> > +    Coroutine *co_recv;
> > +
> >      uint32_t aioreq_seq_num;
> >      QLIST_HEAD(outstanding_aio_head, AIOReq) outstanding_aio_head;
> >  } BDRVSheepdogState;
> > @@ -346,19 +350,16 @@ static const char * sd_strerror(int err)
> >  /*
> >   * Sheepdog I/O handling:
> >   *
> > - * 1. In the sd_aio_readv/writev, read/write requests are added to the
> > - *    QEMU Bottom Halves.
> > - *
> > - * 2. In sd_readv_writev_bh_cb, the callbacks of BHs, we send the I/O
> > - *    requests to the server and link the requests to the
> > - *    outstanding_list in the BDRVSheepdogState.  we exits the
> > - *    function without waiting for receiving the response.
> > + * 1. In sd_co_rw_vector, we send the I/O requests to the server and
> > + *    link the requests to the outstanding_list in the
> > + *    BDRVSheepdogState.  The function exits without waiting for
> > + *    receiving the response.
> >   *
> > - * 3. We receive the response in aio_read_response, the fd handler to
> > + * 2. We receive the response in aio_read_response, the fd handler to
> >   *    the sheepdog connection.  If metadata update is needed, we send
> >   *    the write request to the vdi object in sd_write_done, the write
> > - *    completion function.  The AIOCB callback is not called until all
> > - *    the requests belonging to the AIOCB are finished.
> > + *    completion function.  We switch back to sd_co_readv/writev after
> > + *    all the requests belonging to the AIOCB are finished.
> >   */
> >  
> >  static inline AIOReq *alloc_aio_req(BDRVSheepdogState *s, SheepdogAIOCB *acb,
> > @@ -398,7 +399,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req)
> >  static void sd_finish_aiocb(SheepdogAIOCB *acb)
> >  {
> >      if (!acb->canceled) {
> > -        acb->common.cb(acb->common.opaque, acb->ret);
> > +        qemu_coroutine_enter(acb->coroutine, NULL);
> >      }
> >      qemu_aio_release(acb);
> >  }
> > @@ -411,7 +412,8 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
> >       * Sheepdog cannot cancel the requests which are already sent to
> >       * the servers, so we just complete the request with -EIO here.
> >       */
> > -    acb->common.cb(acb->common.opaque, -EIO);
> > +    acb->ret = -EIO;
> > +    qemu_coroutine_enter(acb->coroutine, NULL);
> >      acb->canceled = 1;
> >  }
> >  
> > @@ -435,24 +437,12 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,
> >  
> >      acb->aio_done_func = NULL;
> >      acb->canceled = 0;
> > -    acb->bh = NULL;
> > +    acb->coroutine = qemu_coroutine_self();
> >      acb->ret = 0;
> >      QLIST_INIT(&acb->aioreq_head);
> >      return acb;
> >  }
> >  
> > -static int sd_schedule_bh(QEMUBHFunc *cb, SheepdogAIOCB *acb)
> > -{
> > -    if (acb->bh) {
> > -        error_report("bug: %d %d", acb->aiocb_type, acb->aiocb_type);
> > -        return -EIO;
> > -    }
> > -
> > -    acb->bh = qemu_bh_new(cb, acb);
> > -    qemu_bh_schedule(acb->bh);
> > -    return 0;
> > -}
> > -
> >  #ifdef _WIN32
> >  
> >  struct msghdr {
> > @@ -635,7 +625,13 @@ static int do_readv_writev(int sockfd, struct iovec *iov, int len,
> >  again:
> >      ret = do_send_recv(sockfd, iov, len, iov_offset, write);
> >      if (ret < 0) {
> > -        if (errno == EINTR || errno == EAGAIN) {
> > +        if (errno == EINTR) {
> > +            goto again;
> > +        }
> > +        if (errno == EAGAIN) {
> > +            if (qemu_in_coroutine()) {
> > +                qemu_coroutine_yield();
> > +            }
> 
> Who reenters the coroutine if we yield here?

The fd handlers, co_write_request() and co_read_response(), will call
qemu_coroutine_enter() for the coroutine.  It will be restarted after
the sheepdog network connection becomes ready.

> 
> And of course for a coroutine based block driver I would like to see
> much less code in callback functions. But it's a good start anyway.

Yes, actually they can be much simpler.  This patch adds asynchrous
I/O support with a minimal change based on a coroutine, and I think
code simplification would be the next step.


Thanks,

Kazutaka



More information about the sheepdog mailing list