At Tue, 05 Jun 2012 21:30:58 +0800, Liu Yuan wrote: > > On 06/05/2012 09:23 PM, Christoph Hellwig wrote: > > > On Tue, Jun 05, 2012 at 09:20:30PM +0800, Liu Yuan wrote: > >> You meant the requests from Guest could be overlapped? Disk is seen by > >> guest OS as an array of blocks, I can't imagine any sane driver issue > >> overlapped requests to disk drive controller. > > > > Yes, we've seen overlapping or duplicate requests from the OS. > > > > Assuming any sort of sanity from the OS is unfortunately not an option > > in virtualized environments. > > > This would be something odd, but we have to face up with. Well, then > there might be problems with overlapped requests: > > Ordering: > 1 over network, the requests seen by Sheepdog might not be the same as > is from Guest. > 2 how do we deal with overlapped read/write, write/write pair? > > Looks to me, even with locks, we can't cope with it. The below change to the qemu block driver would fix the problem. I think this kind of ordering should be done in the gateway node, though. diff --git a/block/sheepdog.c b/block/sheepdog.c index 8877f45..68b35c0 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -645,6 +645,9 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid, ui if (QLIST_EMPTY(&acb->aioreq_head)) { sd_finish_aiocb(acb); } + } else { + /* send pending requests one by one */ + return; } } } @@ -710,13 +713,6 @@ static void coroutine_fn aio_read_response(void *opaque) s->inode.data_vdi_id[idx] = s->inode.vdi_id; s->max_dirty_data_idx = MAX(idx, s->max_dirty_data_idx); s->min_dirty_data_idx = MIN(idx, s->min_dirty_data_idx); - - /* - * Some requests may be blocked because simultaneous - * create requests are not allowed, so we search the - * pending requests here. - */ - send_pending_req(s, vid_to_data_oid(s->inode.vdi_id, idx), rsp.id); } break; case AIOCB_READ_UDATA: @@ -729,6 +725,13 @@ static void coroutine_fn aio_read_response(void *opaque) break; } + /* + * Some requests may be blocked because simultaneous requests to + * the same object are not allowed, so we search the pending + * requests here. + */ + send_pending_req(s, vid_to_data_oid(s->inode.vdi_id, idx), rsp.id); + if (rsp.result != SD_RES_SUCCESS) { acb->ret = -EIO; error_report("%s", sd_strerror(rsp.result)); @@ -1580,7 +1583,7 @@ static int coroutine_fn sd_co_rw_vector(void *p) aio_req = alloc_aio_req(s, acb, oid, len, offset, flags, old_oid, done); - if (create) { + if (1) { AIOReq *areq; QLIST_FOREACH(areq, &s->outstanding_aio_head, outstanding_aio_siblings) { @@ -1589,10 +1592,9 @@ static int coroutine_fn sd_co_rw_vector(void *p) } if (areq->oid == oid) { /* - * Sheepdog cannot handle simultaneous create - * requests to the same object. So we cannot send - * the request until the previous request - * finishes. + * Sheepdog cannot handle simultaneous requests to + * the same object. So we cannot send the request + * until the previous request finishes. */ aio_req->flags = 0; aio_req->base_oid = 0; |