<div><br></div><div><div>what is sheepdog-ng?</div><div><br></div><div><br></div><div style="font-size: 12px;font-family: Arial Narrow;padding:2px 0 2px 0;">------------------ Original ------------------</div><div style="font-size: 12px;background:#efefef;padding:8px;"><div><b>From: </b> "sheepdog-request";<sheepdog-request@lists.wpkg.org>;</div><div><b>Date: </b> Tue, Jul 28, 2015 06:00 PM</div><div><b>To: </b> "sheepdog"<sheepdog@lists.wpkg.org>; <wbr></div><div></div><div><b>Subject: </b> sheepdog Digest, Vol 70, Issue 33</div></div><div><br></div>Send sheepdog mailing list submissions to<br>  sheepdog@lists.wpkg.org<br><br>To subscribe or unsubscribe via the World Wide Web, visit<br>  https://lists.wpkg.org/mailman/listinfo/sheepdog<br>or, via email, send a message with subject or body 'help' to<br>        sheepdog-request@lists.wpkg.org<br><br>You can reach the person managing the list at<br>      sheepdog-owner@lists.wpkg.org<br><br>When replying, please edit your Subject line so it is more specific<br>than "Re: Contents of sheepdog digest..."<br><br><br>Today's Topics:<br><br>   1. Re: [Qemu-devel] [PATCH] sheepdog: serialize requests to<br>      overwrapping area (Vasiliy Tolstov)<br>   2. Re: [Qemu-devel] [PATCH] sheepdog: serialize requests to<br>      overwrapping area (Jeff Cody)<br>   3. Re: [PATCH] sheepdog: serialize requests to overwrapping area<br>      (Liu Yuan)<br>   4. Re: [PATCH] sheepdog: serialize requests to overwrapping area<br>      (Liu Yuan)<br><br><br>----------------------------------------------------------------------<br><br>Message: 1<br>Date: Mon, 27 Jul 2015 18:36:08 +0300<br>From: Vasiliy Tolstov <v.tolstov@selfip.ru><br>To: Jeff Cody <jcody@redhat.com><br>Cc: Kevin Wolf <kwolf@redhat.com>, sheepdog@lists.wpkg.org, Hitoshi<br>     Mitake <mitake.hitoshi@lab.ntt.co.jp>, qemu-devel@nongnu.org, Stefan<br>    Hajnoczi <stefanha@redhat.com>, Teruaki Ishizaki<br>        <ishizaki.teruaki@lab.ntt.co.jp><br>Subject: Re: [sheepdog] [Qemu-devel] [PATCH] sheepdog: serialize<br>      requests to     overwrapping area<br>Message-ID:<br>        <CACaajQt_fzqgKts7RnqDvxFeuwFzQ6AHie1+TSXiye0x=di11A@mail.gmail.com><br>Content-Type: text/plain; charset=UTF-8<br><br>2015-07-27 18:23 GMT+03:00 Jeff Cody <jcody@redhat.com>:<br>> Thanks, applied to my block branch:<br>> https://github.com/codyprime/qemu-kvm-jtc/tree/block<br><br><br>Thanks! Waiting for adding to qemu rc =)<br><br>-- <br>Vasiliy Tolstov,<br>e-mail: v.tolstov@selfip.ru<br><br><br>------------------------------<br><br>Message: 2<br>Date: Mon, 27 Jul 2015 11:23:02 -0400<br>From: Jeff Cody <jcody@redhat.com><br>To: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp><br>Cc: Kevin Wolf <kwolf@redhat.com>, sheepdog@lists.wpkg.org,<br> qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, Teruaki<br>   Ishizaki <ishizaki.teruaki@lab.ntt.co.jp><br>Subject: Re: [sheepdog] [Qemu-devel] [PATCH] sheepdog: serialize<br>     requests to overwrapping area<br>Message-ID: <20150727152302.GH21772@localhost.localdomain><br>Content-Type: text/plain; charset=us-ascii<br><br>On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:<br>> Current sheepdog driver only serializes create requests in oid<br>> unit. This mechanism isn't enough for handling requests to<br>> overwrapping area spanning multiple oids, so it can result bugs like<br>> below:<br>> https://bugs.launchpad.net/sheepdog-project/+bug/1456421<br>> <br>> This patch adds a new serialization mechanism for the problem. The<br>> difference from the old one is:<br>> 1. serialize entire aiocb if their targetting areas overwrap<br>> 2. serialize all requests (read, write, and discard), not only creates<br>> <br>> This patch also removes the old mechanism because the new one can be<br>> an alternative.<br>> <br>> Cc: Kevin Wolf <kwolf@redhat.com><br>> Cc: Stefan Hajnoczi <stefanha@redhat.com><br>> Cc: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp><br>> Cc: Vasiliy Tolstov <v.tolstov@selfip.ru><br>> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp><br>> ---<br>>  block/sheepdog.c | 152 ++++++++++++++++++++++++++-----------------------------<br>>  1 file changed, 71 insertions(+), 81 deletions(-)<br>> <br>> diff --git a/block/sheepdog.c b/block/sheepdog.c<br>> index bd7cbed..9585beb 100644<br>> --- a/block/sheepdog.c<br>> +++ b/block/sheepdog.c<br>> @@ -318,6 +318,10 @@ enum AIOCBState {<br>>      AIOCB_DISCARD_OBJ,<br>>  };<br>>  <br>> +#define AIOCBOverwrapping(x, y)                                 \<br>> +    (!(x->max_affect_data_idx < y->min_affect_data_idx          \<br>> +       || y->max_affect_data_idx < x->min_affect_data_idx))<br>> +<br>>  struct SheepdogAIOCB {<br>>      BlockAIOCB common;<br>>  <br>> @@ -334,6 +338,11 @@ struct SheepdogAIOCB {<br>>  <br>>      bool cancelable;<br>>      int nr_pending;<br>> +<br>> +    uint32_t min_affect_data_idx;<br>> +    uint32_t max_affect_data_idx;<br>> +<br>> +    QLIST_ENTRY(SheepdogAIOCB) aiocb_siblings;<br>>  };<br>>  <br>>  typedef struct BDRVSheepdogState {<br>> @@ -362,8 +371,10 @@ typedef struct BDRVSheepdogState {<br>>  <br>>      /* Every aio request must be linked to either of these queues. */<br>>      QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;<br>> -    QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;<br>>      QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;<br>> +<br>> +    CoQueue overwrapping_queue;<br>> +    QLIST_HEAD(inflight_aiocb_head, SheepdogAIOCB) inflight_aiocb_head;<br>>  } BDRVSheepdogState;<br>>  <br>>  static const char * sd_strerror(int err)<br>> @@ -498,13 +509,7 @@ static void sd_aio_cancel(BlockAIOCB *blockacb)<br>>      AIOReq *aioreq, *next;<br>>  <br>>      if (sd_acb_cancelable(acb)) {<br>> -        /* Remove outstanding requests from pending and failed queues.  */<br>> -        QLIST_FOREACH_SAFE(aioreq, &s->pending_aio_head, aio_siblings,<br>> -                           next) {<br>> -            if (aioreq->aiocb == acb) {<br>> -                free_aio_req(s, aioreq);<br>> -            }<br>> -        }<br>> +        /* Remove outstanding requests from failed queue.  */<br>>          QLIST_FOREACH_SAFE(aioreq, &s->failed_aio_head, aio_siblings,<br>>                             next) {<br>>              if (aioreq->aiocb == acb) {<br>> @@ -529,6 +534,10 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,<br>>                                     int64_t sector_num, int nb_sectors)<br>>  {<br>>      SheepdogAIOCB *acb;<br>> +    uint32_t object_size;<br>> +    BDRVSheepdogState *s = bs->opaque;<br>> +<br>> +    object_size = (UINT32_C(1) << s->inode.block_size_shift);<br>>  <br>>      acb = qemu_aio_get(&sd_aiocb_info, bs, NULL, NULL);<br>>  <br>> @@ -542,6 +551,11 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,<br>>      acb->coroutine = qemu_coroutine_self();<br>>      acb->ret = 0;<br>>      acb->nr_pending = 0;<br>> +<br>> +    acb->min_affect_data_idx = acb->sector_num * BDRV_SECTOR_SIZE / object_size;<br>> +    acb->max_affect_data_idx = (acb->sector_num * BDRV_SECTOR_SIZE +<br>> +                              acb->nb_sectors * BDRV_SECTOR_SIZE) / object_size;<br>> +<br>>      return acb;<br>>  }<br>>  <br>> @@ -703,38 +717,6 @@ static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag);<br>>  static int get_sheep_fd(BDRVSheepdogState *s, Error **errp);<br>>  static void co_write_request(void *opaque);<br>>  <br>> -static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)<br>> -{<br>> -    AIOReq *aio_req;<br>> -<br>> -    QLIST_FOREACH(aio_req, &s->pending_aio_head, aio_siblings) {<br>> -        if (aio_req->oid == oid) {<br>> -            return aio_req;<br>> -        }<br>> -    }<br>> -<br>> -    return NULL;<br>> -}<br>> -<br>> -/*<br>> - * This function searchs pending requests to the object `oid', and<br>> - * sends them.<br>> - */<br>> -static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid)<br>> -{<br>> -    AIOReq *aio_req;<br>> -    SheepdogAIOCB *acb;<br>> -<br>> -    while ((aio_req = find_pending_req(s, oid)) != NULL) {<br>> -        acb = aio_req->aiocb;<br>> -        /* move aio_req from pending list to inflight one */<br>> -        QLIST_REMOVE(aio_req, aio_siblings);<br>> -        QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);<br>> -        add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov,<br>> -                        acb->aiocb_type);<br>> -    }<br>> -}<br>> -<br>>  static coroutine_fn void reconnect_to_sdog(void *opaque)<br>>  {<br>>      BDRVSheepdogState *s = opaque;<br>> @@ -840,12 +822,6 @@ static void coroutine_fn aio_read_response(void *opaque)<br>>                  s->max_dirty_data_idx = MAX(idx, s->max_dirty_data_idx);<br>>                  s->min_dirty_data_idx = MIN(idx, s->min_dirty_data_idx);<br>>              }<br>> -            /*<br>> -             * Some requests may be blocked because simultaneous<br>> -             * create requests are not allowed, so we search the<br>> -             * pending requests here.<br>> -             */<br>> -            send_pending_req(s, aio_req->oid);<br>>          }<br>>          break;<br>>      case AIOCB_READ_UDATA:<br>> @@ -1341,30 +1317,6 @@ out:<br>>      return ret;<br>>  }<br>>  <br>> -/* Return true if the specified request is linked to the pending list. */<br>> -static bool check_simultaneous_create(BDRVSheepdogState *s, AIOReq *aio_req)<br>> -{<br>> -    AIOReq *areq;<br>> -    QLIST_FOREACH(areq, &s->inflight_aio_head, aio_siblings) {<br>> -        if (areq != aio_req && areq->oid == aio_req->oid) {<br>> -            /*<br>> -             * Sheepdog cannot handle simultaneous create requests to the same<br>> -             * object, so we cannot send the request until the previous request<br>> -             * finishes.<br>> -             */<br>> -            DPRINTF("simultaneous create to %" PRIx64 "\n", aio_req->oid);<br>> -            aio_req->flags = 0;<br>> -            aio_req->base_oid = 0;<br>> -            aio_req->create = false;<br>> -            QLIST_REMOVE(aio_req, aio_siblings);<br>> -            QLIST_INSERT_HEAD(&s->pending_aio_head, aio_req, aio_siblings);<br>> -            return true;<br>> -        }<br>> -    }<br>> -<br>> -    return false;<br>> -}<br>> -<br>>  static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)<br>>  {<br>>      SheepdogAIOCB *acb = aio_req->aiocb;<br>> @@ -1379,10 +1331,6 @@ static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)<br>>              goto out;<br>>          }<br>>  <br>> -        if (check_simultaneous_create(s, aio_req)) {<br>> -            return;<br>> -        }<br>> -<br>>          if (s->inode.data_vdi_id[idx]) {<br>>              aio_req->base_oid = vid_to_data_oid(s->inode.data_vdi_id[idx], idx);<br>>              aio_req->flags |= SD_FLAG_CMD_COW;<br>> @@ -1458,8 +1406,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,<br>>      filename = qemu_opt_get(opts, "filename");<br>>  <br>>      QLIST_INIT(&s->inflight_aio_head);<br>> -    QLIST_INIT(&s->pending_aio_head);<br>>      QLIST_INIT(&s->failed_aio_head);<br>> +    QLIST_INIT(&s->inflight_aiocb_head);<br>>      s->fd = -1;<br>>  <br>>      memset(vdi, 0, sizeof(vdi));<br>> @@ -1524,6 +1472,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,<br>>      bs->total_sectors = s->inode.vdi_size / BDRV_SECTOR_SIZE;<br>>      pstrcpy(s->name, sizeof(s->name), vdi);<br>>      qemu_co_mutex_init(&s->lock);<br>> +    qemu_co_queue_init(&s->overwrapping_queue);<br>>      qemu_opts_del(opts);<br>>      g_free(buf);<br>>      return 0;<br>> @@ -2195,12 +2144,6 @@ static int coroutine_fn sd_co_rw_vector(void *p)<br>>                                  old_oid, done);<br>>          QLIST_INSERT_HEAD(&s->inflight_aio_head, aio_req, aio_siblings);<br>>  <br>> -        if (create) {<br>> -            if (check_simultaneous_create(s, aio_req)) {<br>> -                goto done;<br>> -            }<br>> -        }<br>> -<br>>          add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov,<br>>                          acb->aiocb_type);<br>>      done:<br>> @@ -2215,6 +2158,20 @@ out:<br>>      return 1;<br>>  }<br>>  <br>> +static bool check_overwrapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB *aiocb)<br>> +{<br>> +    SheepdogAIOCB *cb;<br>> +<br>> +    QLIST_FOREACH(cb, &s->inflight_aiocb_head, aiocb_siblings) {<br>> +        if (AIOCBOverwrapping(aiocb, cb)) {<br>> +            return true;<br>> +        }<br>> +    }<br>> +<br>> +    QLIST_INSERT_HEAD(&s->inflight_aiocb_head, aiocb, aiocb_siblings);<br>> +    return false;<br>> +}<br>> +<br>>  static coroutine_fn int sd_co_writev(BlockDriverState *bs, int64_t sector_num,<br>>                          int nb_sectors, QEMUIOVector *qiov)<br>>  {<br>> @@ -2234,14 +2191,25 @@ static coroutine_fn int sd_co_writev(BlockDriverState *bs, int64_t sector_num,<br>>      acb->aio_done_func = sd_write_done;<br>>      acb->aiocb_type = AIOCB_WRITE_UDATA;<br>>  <br>> +retry:<br>> +    if (check_overwrapping_aiocb(s, acb)) {<br>> +        qemu_co_queue_wait(&s->overwrapping_queue);<br>> +        goto retry;<br>> +    }<br>> +<br>>      ret = sd_co_rw_vector(acb);<br>>      if (ret <= 0) {<br>> +        QLIST_REMOVE(acb, aiocb_siblings);<br>> +        qemu_co_queue_restart_all(&s->overwrapping_queue);<br>>          qemu_aio_unref(acb);<br>>          return ret;<br>>      }<br>>  <br>>      qemu_coroutine_yield();<br>>  <br>> +    QLIST_REMOVE(acb, aiocb_siblings);<br>> +    qemu_co_queue_restart_all(&s->overwrapping_queue);<br>> +<br>>      return acb->ret;<br>>  }<br>>  <br>> @@ -2250,19 +2218,30 @@ static coroutine_fn int sd_co_readv(BlockDriverState *bs, int64_t sector_num,<br>>  {<br>>      SheepdogAIOCB *acb;<br>>      int ret;<br>> +    BDRVSheepdogState *s = bs->opaque;<br>>  <br>>      acb = sd_aio_setup(bs, qiov, sector_num, nb_sectors);<br>>      acb->aiocb_type = AIOCB_READ_UDATA;<br>>      acb->aio_done_func = sd_finish_aiocb;<br>>  <br>> +retry:<br>> +    if (check_overwrapping_aiocb(s, acb)) {<br>> +        qemu_co_queue_wait(&s->overwrapping_queue);<br>> +        goto retry;<br>> +    }<br>> +<br>>      ret = sd_co_rw_vector(acb);<br>>      if (ret <= 0) {<br>> +        QLIST_REMOVE(acb, aiocb_siblings);<br>> +        qemu_co_queue_restart_all(&s->overwrapping_queue);<br>>          qemu_aio_unref(acb);<br>>          return ret;<br>>      }<br>>  <br>>      qemu_coroutine_yield();<br>>  <br>> +    QLIST_REMOVE(acb, aiocb_siblings);<br>> +    qemu_co_queue_restart_all(&s->overwrapping_queue);<br>>      return acb->ret;<br>>  }<br>>  <br>> @@ -2610,14 +2589,25 @@ static coroutine_fn int sd_co_discard(BlockDriverState *bs, int64_t sector_num,<br>>      acb->aiocb_type = AIOCB_DISCARD_OBJ;<br>>      acb->aio_done_func = sd_finish_aiocb;<br>>  <br>> +retry:<br>> +    if (check_overwrapping_aiocb(s, acb)) {<br>> +        qemu_co_queue_wait(&s->overwrapping_queue);<br>> +        goto retry;<br>> +    }<br>> +<br>>      ret = sd_co_rw_vector(acb);<br>>      if (ret <= 0) {<br>> +        QLIST_REMOVE(acb, aiocb_siblings);<br>> +        qemu_co_queue_restart_all(&s->overwrapping_queue);<br>>          qemu_aio_unref(acb);<br>>          return ret;<br>>      }<br>>  <br>>      qemu_coroutine_yield();<br>>  <br>> +    QLIST_REMOVE(acb, aiocb_siblings);<br>> +    qemu_co_queue_restart_all(&s->overwrapping_queue);<br>> +<br>>      return acb->ret;<br>>  }<br>>  <br>> -- <br>> 1.9.1<br>> <br>><br><br>Thanks, applied to my block branch:<br>https://github.com/codyprime/qemu-kvm-jtc/tree/block<br><br><br>Jeff<br><br><br><br>------------------------------<br><br>Message: 3<br>Date: Tue, 28 Jul 2015 16:50:08 +0800<br>From: Liu Yuan <namei.unix@gmail.com><br>To: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp><br>Cc: Kevin Wolf <kwolf@redhat.com>, sheepdog@lists.wpkg.org,<br>        qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, Teruaki<br>   Ishizaki <ishizaki.teruaki@lab.ntt.co.jp><br>Subject: Re: [sheepdog] [PATCH] sheepdog: serialize requests to<br>      overwrapping area<br>Message-ID: <20150728085008.GA4291@ubuntu-trusty><br>Content-Type: text/plain; charset=us-ascii<br><br>On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:<br>> Current sheepdog driver only serializes create requests in oid<br>> unit. This mechanism isn't enough for handling requests to<br>> overwrapping area spanning multiple oids, so it can result bugs like<br>> below:<br>> https://bugs.launchpad.net/sheepdog-project/+bug/1456421<br><br>I'm a bit late to review the patch since I'm not on the cc list, but I'd like to<br>get the idea how the mentioned bug relates to the serialization of requests?<br><br>The mentioned bug looks to me more a bug of sheepdog because the create and<br>write request will only unref a single oid. The bug report is unclear about<br>why the object idx in inode becomes zero, at least not pointing that it relates<br>to QEMU.<br><br>But this patch assume QEMU send the requests the wrong way and just vaguely<br>says it is just wrong without reason.<br><br>What is overrapping requests? As far as I understand, the request that stride<br>over two objects will be split into two, to make sure all the requests fit the<br>sheepdog object size. Allow requests run concurrently on different SD objects is<br>way achieving high performance. This patch mutes this feature, to me, without a<br>decent reason. <br><br>Probably I miss something hidden, but I'd like someone enlighten me about it<br>because this patch might slow down QEMU VM over sheepdog.<br><br>Thanks,<br>Yuan<br><br><br>------------------------------<br><br>Message: 4<br>Date: Tue, 28 Jul 2015 17:35:02 +0800<br>From: Liu Yuan <namei.unix@gmail.com><br>To: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp><br>Cc: Kevin Wolf <kwolf@redhat.com>, sheepdog@lists.wpkg.org, Jeff Cody<br>    <jcody@redhat.com>, qemu-devel@nongnu.org, Stefan Hajnoczi<br>      <stefanha@redhat.com>, Teruaki Ishizaki<br> <ishizaki.teruaki@lab.ntt.co.jp><br>Subject: Re: [sheepdog] [PATCH] sheepdog: serialize requests to<br>       overwrapping area<br>Message-ID: <20150728093502.GA8357@ubuntu-trusty><br>Content-Type: text/plain; charset=us-ascii<br><br>On Tue, Jul 28, 2015 at 04:50:08PM +0800, Liu Yuan wrote:<br>> On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:<br>> > Current sheepdog driver only serializes create requests in oid<br>> > unit. This mechanism isn't enough for handling requests to<br>> > overwrapping area spanning multiple oids, so it can result bugs like<br>> > below:<br>> > https://bugs.launchpad.net/sheepdog-project/+bug/1456421<br>> <br>> I'm a bit late to review the patch since I'm not on the cc list, but I'd like to<br>> get the idea how the mentioned bug relates to the serialization of requests?<br>> <br>> The mentioned bug looks to me more a bug of sheepdog because the create and<br>> write request will only unref a single oid. The bug report is unclear about<br>> why the object idx in inode becomes zero, at least not pointing that it relates<br>> to QEMU.<br>> <br>> But this patch assume QEMU send the requests the wrong way and just vaguely<br>> says it is just wrong without reason.<br>> <br>> What is overrapping requests? As far as I understand, the request that stride<br>> over two objects will be split into two, to make sure all the requests fit the<br>> sheepdog object size. Allow requests run concurrently on different SD objects is<br>> way achieving high performance. This patch mutes this feature, to me, without a<br>> decent reason. <br>> <br>> Probably I miss something hidden, but I'd like someone enlighten me about it<br>> because this patch might slow down QEMU VM over sheepdog.<br>> <br>> Thanks,<br>> Yuan<br><br>Cc Jeff<br><br><br>------------------------------<br><br>Subject: Digest Footer<br><br>_______________________________________________<br>sheepdog mailing list<br>sheepdog@lists.wpkg.org<br>https://lists.wpkg.org/mailman/listinfo/sheepdog<br><br><br>------------------------------<br><br>End of sheepdog Digest, Vol 70, Issue 33<br>****************************************</div>