[sheepdog] [Qemu-devel] [PATCH] sheepdog: fix overlapping metadata update

Liu Yuan namei.unix at gmail.com
Mon Aug 3 04:01:00 CEST 2015


On Thu, Jul 30, 2015 at 09:27:44AM -0400, Jeff Cody wrote:
> On Thu, Jul 30, 2015 at 09:41:08AM +0300, Vasiliy Tolstov wrote:
> > 2015-07-29 12:31 GMT+03:00 Liu Yuan <namei.unix at gmail.com>:
> > > Technically, it won't affect the performance because index updates are not range
> > > but concrete in terms of underlying 4M block size. Only 2 or 3 indexes in a
> > > range will be updated and 90+% updates will be only 1. So if 2 updates stride a
> > > large range, it will actually worse the performance of sheepdog because many
> > > additional unref of object will be executed by sheep internally.
> > >
> > > It is not a performance problem but more the right fix. Even with your patch,
> > > updates of inode can overlap. You just don't allow overlapped requests go to
> > > sheepdog, which is a overkill approach. IMHO, we should only adjust to avoid
> > > the overlapped inode updates, which can be done easily and incrementally on top
> > > of old code, rather than take on a complete new untested overkill mechanism. So
> > > what we get from your patch? Covering the problem and lock every requests?
> > >
> > > Your patch actually fix nothing but just cover the problem by slowing down the
> > > request and even with your patch, the problem still exists because inode updates
> > > can overlap. Your commit log doesn't explain what is the real problem and why
> > > your fix works. This is not your toy project that can commit whatever you want.
> > >
> > >> BTW, sheepdog project was already forked, why don't you fork the block
> > >> driver, too?
> > >
> > > What makes you think you own the block driver?
> > >
> > > We forked the sheepdog project because it is low quality of code partly and mostly
> > > some company tries to make it a private project. It is not as open source friendly
> > > as before and that is the main reason Kazutaka and I chose to fork the sheepdog
> > > project. But this doesn't mean we need to fork the QEMU project, it is an
> > > open source project and not your home toy.
> > >
> > > Kazutaka and I are the biggest contributers of both sheepdog and QEMU sheepdog
> > > block driver for years, so I think I am eligible to review the patch and
> > > responsible to suggest the right fix. If you are pissed off when someone else
> > > have other opinions, you can just fork the code and play with it at home or you
> > > follow the rule of open source project.
> > 
> > 
> > Jeff Cody, please be the judge, patch from Hitoshi solved my problem
> > that i emailed in sheepdog list (i have test environment with 8 hosts
> > on each 6 SSD disks and infiniband interconnect between hosts) before
> > Hitoshi patch, massive writing to sheepdog storage breaks file system
> > and corrupt it.
> > After the patch i don't see issues.
> >
> 
> I'd rather see some sort consensus amongst Liu, Hitoshi, yourself, or
> others more intimately familiar with sheepdog.
> 
> Right now, we have Hitoshi's patch in the main git repo, slated for
> 2.4 release (which is Monday).  It sounds, from Liu's email, as this
> may not fix the root cause.
> 
> Vasiliy said he would test Liu's patch; if he can confirm this new
> patch fix, then I would be inclined to use Liu's patch, based on the
> detailed analysis of the issue in the commit message.
> 

This is my performance comparison on top of latest QEMU with my latop with SSD.

sheepdog cluster run with 3 nodes with '-n' to get best volume performance.
QEMU command:
qemu-system-x86_64 -m 1024 --enable-kvm \
	-drive file=debian_squeeze_amd64_standard.qcow2,cache=writeback,if=virtio \
	-drive file=sheepdog:test,if=virtio

sheepdog:test is created as 'dog vdi create test 80G'

I test both time for mkfs and iops for fio write.

fio.conf:
[global]
ioengine=libaio
direct=1
thread=1
norandommap=1
runtime=60
size=300M
directory=/mnt

[write4k-rand]
stonewall
group_reporting
bs=4k
rw=randwrite
numjobs=8
iodepth=32

Resualt:
================================================
sheep formated with -c 2:1 (erasure coding)
       mkfs      fio
Yuan   0.069     4578    
Hitosh 0.071     3722

sheep formarted with -c 2 (replication)
       mkfs      fio
Yuan   0.074     6873  
Hitosh 0.081     6174
================================================

Thanks,
Yuan


More information about the sheepdog mailing list