[sheepdog] [Qemu-devel] [PATCH] sheepdog: serialize requests to overwrapping area

Liu Yuan namei.unix at gmail.com
Tue Jul 28 16:47:37 CEST 2015


On Tue, Jul 28, 2015 at 10:31:32PM +0800, Liu Yuan wrote:
> On Mon, Jul 27, 2015 at 11:23:02AM -0400, Jeff Cody wrote:
> > On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:
> > > Current sheepdog driver only serializes create requests in oid
> > > unit. This mechanism isn't enough for handling requests to
> > > overwrapping area spanning multiple oids, so it can result bugs like
> > > below:
> > > https://bugs.launchpad.net/sheepdog-project/+bug/1456421
> > > 
> > > This patch adds a new serialization mechanism for the problem. The
> > > difference from the old one is:
> > > 1. serialize entire aiocb if their targetting areas overwrap
> > > 2. serialize all requests (read, write, and discard), not only creates
> > > 
> > > This patch also removes the old mechanism because the new one can be
> > > an alternative.
> > > 
> 
> Okay, I figured out what the problem is myself and allow me to try to make it
> clear to non-sheepdog devs:
> 
> sheepdog volume is thin-provision, so for the first write, we create the object
> internally, meaning that we need to handle write in two case:
> 
> 1. write to non-allocated object, create it then update inode, so in this case
>    two request will be generated: create(oid), update_inode(oid_to_inode_idx)
> 2. write the allocated object, just write(oid).
> 
> Current sheepdog driver use a range update_inode(min_idx, max_idx) for batching
> the updates. But there is subtle problem by determining min_idx and max_idx:
> 
> for a single create request, min_idx == max_idx, so actually we just update one
> one bit as expected.
> 
> Suppose we have 2 create request, create(10) and create(20), then min == 10,
> max==20 even though we just need to update index 10 and index 20, update_inode(10,20)
> will actually update range from 10 to 20. This would work if all the update_inode()
> requests won't overlap. But unfortunately, this is not true for some corner case.
> So the problem arise as following:
> 
> req 1: update_inode(10,20)
> req 2: update_inode(15,22)

I patched the qemu to print min and max, and confirmed my analysis:
diff --git a/block/sheepdog.c b/block/sheepdog.c
index bd7cbed..f3e40e8 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1976,6 +1976,7 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 
     mn = s->min_dirty_data_idx;
     mx = s->max_dirty_data_idx;
+    printf("min %u, max %u\n", mn, mx);
     if (mn <= mx) {
         /* we need to update the vdi object. */
         offset = sizeof(s->inode) - sizeof(s->inode.data_vdi_id) +
diff --git a/dtc b/dtc
index 65cc4d2..bc895d6 160000

...
min 4294967295, max 0
min 9221, max 9222
min 9224, max 9728
min 9223, max 9223
min 9729, max 9730
min 9731, max 9732
min 9733, max 9734
min 9736, max 10240 
min 9735, max 10241 
...

Every line represents a update_inode(min, max) last 2 lines show 2 requests
actually overlap while I ran mkfs.ext4 on a sheepdog volume. The overlapped
requests might be reordered via network and cause inode[idx] back to 0 after
being set by last request. Then a wrong remove request will be executed by sheep
internally and accidentally remove a victim object.

Thanks,
Yuan

> req 1 and req 2 might have different value between [15,20] and cause problems.
> 
> Based on above analysis, I think the real fix is to fix update_inode(), not
> serialize all the requests in overkill way. The fix would be easy, considering
> most update_inode() update only 1 index, we could just make update_inode a
> single bit updater, not a range one, in which way we don't affect performance
> as the above patch did. (I actually suspect that the above patch might not solve
> the problem because update_inode() can overlap even with the patch).
> 
> If everyone agrees with my analysis, I'll post the fix.
> 
> Thanks,
> Yuan


More information about the sheepdog mailing list