[sheepdog] [PATCH 2/2] farm: drop fcntl lock

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Jun 5 16:38:19 CEST 2012


At Tue, 05 Jun 2012 21:30:58 +0800,
Liu Yuan wrote:
> 
> On 06/05/2012 09:23 PM, Christoph Hellwig wrote:
> 
> > On Tue, Jun 05, 2012 at 09:20:30PM +0800, Liu Yuan wrote:
> >> You meant the requests from Guest could be overlapped? Disk is seen by
> >> guest OS as an array of blocks, I can't imagine any sane driver issue
> >> overlapped requests to disk drive controller.
> > 
> > Yes, we've seen overlapping or duplicate requests from the OS.
> > 
> > Assuming any sort of sanity from the OS is unfortunately not an option
> > in virtualized environments.
> 
> 
> This would be something odd, but we have to face up with. Well, then
> there might be problems with overlapped requests:
> 
> Ordering:
> 1 over network, the requests seen by Sheepdog might not be the same as
> is from Guest.
> 2 how do we deal with overlapped read/write, write/write pair?
> 
> Looks to me, even with locks, we can't cope with it.

The below change to the qemu block driver would fix the problem.
I think this kind of ordering should be done in the gateway node,
though.


diff --git a/block/sheepdog.c b/block/sheepdog.c
index 8877f45..68b35c0 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -645,6 +645,9 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid, ui
             if (QLIST_EMPTY(&acb->aioreq_head)) {
                 sd_finish_aiocb(acb);
             }
+        } else {
+            /* send pending requests one by one */
+            return;
         }
     }
 }
@@ -710,13 +713,6 @@ static void coroutine_fn aio_read_response(void *opaque)
             s->inode.data_vdi_id[idx] = s->inode.vdi_id;
             s->max_dirty_data_idx = MAX(idx, s->max_dirty_data_idx);
             s->min_dirty_data_idx = MIN(idx, s->min_dirty_data_idx);
-
-            /*
-             * Some requests may be blocked because simultaneous
-             * create requests are not allowed, so we search the
-             * pending requests here.
-             */
-            send_pending_req(s, vid_to_data_oid(s->inode.vdi_id, idx), rsp.id);
         }
         break;
     case AIOCB_READ_UDATA:
@@ -729,6 +725,13 @@ static void coroutine_fn aio_read_response(void *opaque)
         break;
     }
 
+    /*
+     * Some requests may be blocked because simultaneous requests to
+     * the same object are not allowed, so we search the pending
+     * requests here.
+     */
+    send_pending_req(s, vid_to_data_oid(s->inode.vdi_id, idx), rsp.id);
+
     if (rsp.result != SD_RES_SUCCESS) {
         acb->ret = -EIO;
         error_report("%s", sd_strerror(rsp.result));
@@ -1580,7 +1583,7 @@ static int coroutine_fn sd_co_rw_vector(void *p)
 
         aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, done);
 
-        if (create) {
+        if (1) {
             AIOReq *areq;
             QLIST_FOREACH(areq, &s->outstanding_aio_head,
                           outstanding_aio_siblings) {
@@ -1589,10 +1592,9 @@ static int coroutine_fn sd_co_rw_vector(void *p)
                 }
                 if (areq->oid == oid) {
                     /*
-                     * Sheepdog cannot handle simultaneous create
-                     * requests to the same object.  So we cannot send
-                     * the request until the previous request
-                     * finishes.
+                     * Sheepdog cannot handle simultaneous requests to
+                     * the same object.  So we cannot send the request
+                     * until the previous request finishes.
                      */
                     aio_req->flags = 0;
                     aio_req->base_oid = 0;




More information about the sheepdog mailing list