[Sheepdog] [PATCH 3/3] store: use fallocate when allocating new objects
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Nov 11 12:41:25 CET 2011
At Fri, 11 Nov 2011 04:59:11 -0500,
Christoph Hellwig wrote:
>
> On Fri, Nov 11, 2011 at 04:10:00AM -0500, Christoph Hellwig wrote:
> > On Fri, Nov 11, 2011 at 06:06:16PM +0900, MORITA Kazutaka wrote:
> > > posix_fallocate() shows very poor performance if the underlying
> > > filesystem doesn't support fallocate() (e.g. ext3). How about using
> > > fallocate() instead of posix_fallocate(), and if it returns
> > > EOPNOTSUPP, writing SD_DATA_OBJ_SIZE bytes with one pwrite() call?
> >
> > At least for the samba use case (which is preallocating in 1MB chunks
> > and then filling it with 64k chunks) even the dumb preallocation has
> > shown benefit for ext3. I'll try to benchmark it soon and will report
> > the results to you.
>
> Numbers on my laptop with ext3 on the second dedicated test SSD,
> averaged over three runs (recreated fs each time, restarted sheepdog),
> all using
>
> dd if=/dev/zero of=/dev/vdc bs=67108864 count=16 oflag=direct
>
> note that this is on a fairly old kernel, and I manually had to mount
> with -o barrier=1
>
> With pwrite to the last sectors:
>
> 52.9MB/s for the intial write
> 49.0MS/s for the rewrite
>
> With fallocate:
>
> 62.7MB/s for the initial write
> 54.4MB/s for the rewrite
>
> From this it seems even the dumb fallocate is a clear win, which matches
> the Samba observations.
I've also tried Sheepdog with the fallocated patch, but it was
intolerably slow on my environment.
My environment is:
- Linux 2.6.32
- glibc 2.11
- 1TB SATA disk (write-cache is enabled)
- ext3 (barrier=1)
To test the performance of posix_fallocate() on ext3, I wrote the
following program.
==
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#define BUF_SIZE (4 * 1024 * 1024)
void do_pwrite(int fd)
{
int ret;
static char buf[BUF_SIZE];
ret = pwrite(fd, buf, BUF_SIZE, 0);
assert(ret == BUF_SIZE);
}
void do_fallocate(int fd)
{
int ret;
ret = posix_fallocate(fd, 0, BUF_SIZE);
assert(ret == 0);
}
int main(int argc, char *argv[])
{
int fd;
if (argc < 3) {
printf("usage: %s [filename] (pwrite|fallocate)\n", argv[0]);
return 1;
}
fd = open(argv[1], O_SYNC | O_RDWR | O_CREAT | O_TRUNC, 0644);
assert(fd >= 0);
if (strcmp(argv[2], "pwrite") == 0)
do_pwrite(fd);
else if (strcmp(argv[2], "fallocate") == 0)
do_fallocate(fd);
close(fd);
return 0;
}
==
The result was as follows:
$ time ./a.out temp pwrite
real 0m0.244s
user 0m0.000s
sys 0m0.008s
$ time ./a.out temp fallocate
real 0m43.050s
user 0m0.000s
sys 0m0.060s
I've confirmed the similar results on other machines, too.
I guess posix_fallocate() causes a severe performance problem under
the circumstances that write is slow, because it calls lots of
pwrite() for each ext3 block when fallocate() is not available.
Thanks,
Kazutaka
More information about the sheepdog
mailing list