[sheepdog] [PATCH] sheepdog: fix overlapping metadata update

Liu Yuan namei.unix at gmail.com
Wed Jul 29 11:31:35 CEST 2015


On Wed, Jul 29, 2015 at 02:04:55PM +0900, Hitoshi Mitake wrote:
> At Wed, 29 Jul 2015 12:02:35 +0800,
> Liu Yuan wrote:
> > 
> > From: Liu Yuan <liuyuan at cmss.chinamobile.com>
> > 
> > Current sheepdog driver use a range update_inode(min_idx, max_idx) for batching
> > the updates. But there is subtle problem by determining min_idx and max_idx:
> > 
> > for a single create request, min_idx == max_idx, so actually we just update one
> > one bit as expected.
> > 
> > Suppose we have 2 create request, create(10) and create(20), then min == 10,
> > max==20 even though we just need to update index 10 and index 20, update_inode(10,20)
> > will actually update range from 10 to 20. This would work if all the update_inode()
> > requests won't overlap. But unfortunately, this is not true for some corner case.
> > So the problem arise as following:
> > 
> > req 1: update_inode(10,20)
> > req 2: update_inode(15,22)
> > 
> > req 1 and req 2 might have different value between [15,20] and cause problems
> > and can be illustrated as following by adding a printf in sd_write_done:
> > 
> > @@ -1976,6 +1976,7 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
> > 
> >      mn = s->min_dirty_data_idx;
> >      mx = s->max_dirty_data_idx;
> > +    printf("min %u, max %u\n", mn, mx);
> >      if (mn <= mx) {
> >          /* we need to update the vdi object. */
> >          offset = sizeof(s->inode) - sizeof(s->inode.data_vdi_id) +
> > 
> > ...
> > min 4294967295, max 0
> > min 9221, max 9222
> > min 9224, max 9728
> > min 9223, max 9223
> > min 9729, max 9730
> > min 9731, max 9732
> > min 9733, max 9734
> > min 9736, max 10240
> > min 9735, max 10241
> > ...
> > 
> > Every line represents a update_inode(min, max) last 2 lines show 2 requests
> > actually overlap while I ran mkfs.ext4 on a sheepdog volume. The overlapped
> > requests might be reordered via network and cause inode[idx] back to 0 after
> > being set by last request. Then a wrong remove request will be executed by sheep
> > internally and accidentally remove a victim object, which is reported at:
> > 
> > https://bugs.launchpad.net/sheepdog-project/+bug/1456421
> > 
> > The fix is simple that we just update inode one by one for aio_req. Since
> > aio_req is never overlapped, we'll have inode update never overlapped.
> 
> This patch increase a number of indoe update request than existing approach.
> current: (max - min + 1) * data object creation + 1 inode update
> this patch: (max - min + 1) * data object creation + (max - min + 1) * inode update
> The increased number means increased number of network + disk
> I/O. Even the overwrapping requests can be processed in parallel, the
> overhead seems to be large. It has an impact especially in a case of
> disk I/O heavy workload.
> I don't have a comparison of benchmark result, but it is not obvious
> that the approach of this patch is better.

Technically, it won't affect the performance because index updates are not range
but concrete in terms of underlying 4M block size. Only 2 or 3 indexes in a
range will be updated and 90+% updates will be only 1. So if 2 updates stride a
large range, it will actually worse the performance of sheepdog because many
additional unref of object will be executed by sheep internally.

It is not a performance problem but more the right fix. Even with your patch,
updates of inode can overlap. You just don't allow overlapped requests go to
sheepdog, which is a overkill approach. IMHO, we should only adjust to avoid
the overlapped inode updates, which can be done easily and incrementally on top
of old code, rather than take on a complete new untested overkill mechanism. So
what we get from your patch? Covering the problem and lock every requests?

Your patch actually fix nothing but just cover the problem by slowing down the
request and even with your patch, the problem still exists because inode updates
can overlap. Your commit log doesn't explain what is the real problem and why
your fix works. This is not your toy project that can commit whatever you want.

> BTW, sheepdog project was already forked, why don't you fork the block
> driver, too?

What makes you think you own the block driver?

We forked the sheepdog project because it is low quality of code partly and mostly
some company tries to make it a private project. It is not as open source friendly
as before and that is the main reason Kazutaka and I chose to fork the sheepdog
project. But this doesn't mean we need to fork the QEMU project, it is an
open source project and not your home toy.

Kazutaka and I are the biggest contributers of both sheepdog and QEMU sheepdog
block driver for years, so I think I am eligible to review the patch and
responsible to suggest the right fix. If you are pissed off when someone else
have other opinions, you can just fork the code and play with it at home or you
follow the rule of open source project.

Yuan


More information about the sheepdog mailing list