[Sheepdog] [PATCH 3/3] store: use fallocate when allocating new objects

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Fri Nov 11 12:41:25 CET 2011


At Fri, 11 Nov 2011 04:59:11 -0500,
Christoph Hellwig wrote:
> 
> On Fri, Nov 11, 2011 at 04:10:00AM -0500, Christoph Hellwig wrote:
> > On Fri, Nov 11, 2011 at 06:06:16PM +0900, MORITA Kazutaka wrote:
> > > posix_fallocate() shows very poor performance if the underlying
> > > filesystem doesn't support fallocate() (e.g. ext3).  How about using
> > > fallocate() instead of posix_fallocate(), and if it returns
> > > EOPNOTSUPP, writing SD_DATA_OBJ_SIZE bytes with one pwrite() call?
> > 
> > At least for the samba use case (which is preallocating in 1MB chunks
> > and then filling it with 64k chunks) even the dumb preallocation has
> > shown benefit for ext3.  I'll try to benchmark it soon and will report
> > the results to you.
> 
> Numbers on my laptop with ext3 on the second dedicated test SSD,
> averaged over three runs (recreated fs each time, restarted sheepdog),
> all using
> 
> 	dd if=/dev/zero of=/dev/vdc bs=67108864 count=16 oflag=direct
> 
> note that this is on a fairly old kernel, and I manually had to mount
> with -o barrier=1
> 
> With pwrite to the last sectors:
> 
> 	52.9MB/s for the intial write
> 	49.0MS/s for the rewrite
> 
> With fallocate:
> 
> 	62.7MB/s for the initial write
> 	54.4MB/s for the rewrite
> 
> From this it seems even the dumb fallocate is a clear win, which matches
> the Samba observations.

I've also tried Sheepdog with the fallocated patch, but it was
intolerably slow on my environment.

My environment is:
 - Linux 2.6.32
 - glibc 2.11
 - 1TB SATA disk (write-cache is enabled)
 - ext3 (barrier=1)

To test the performance of posix_fallocate() on ext3, I wrote the
following program.

==
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>

#define BUF_SIZE (4 * 1024 * 1024)

void do_pwrite(int fd)
{
	int ret;
	static char buf[BUF_SIZE];

	ret = pwrite(fd, buf, BUF_SIZE, 0);
	assert(ret == BUF_SIZE);
}

void do_fallocate(int fd)
{
	int ret;

	ret = posix_fallocate(fd, 0, BUF_SIZE);
	assert(ret == 0);
}

int main(int argc, char *argv[])
{
	int fd;

	if (argc < 3) {
		printf("usage: %s [filename] (pwrite|fallocate)\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_SYNC | O_RDWR | O_CREAT | O_TRUNC, 0644);
	assert(fd >= 0);

	if (strcmp(argv[2], "pwrite") == 0)
		do_pwrite(fd);
	else if (strcmp(argv[2], "fallocate") == 0)
		do_fallocate(fd);

	close(fd);

	return 0;
}
==


The result was as follows:

  $ time ./a.out temp pwrite
  
  real   0m0.244s
  user   0m0.000s
  sys    0m0.008s

  $ time ./a.out temp fallocate
  
  real   0m43.050s
  user   0m0.000s
  sys    0m0.060s


I've confirmed the similar results on other machines, too.

I guess posix_fallocate() causes a severe performance problem under
the circumstances that write is slow, because it calls lots of
pwrite() for each ext3 block when fallocate() is not available.

Thanks,

Kazutaka



More information about the sheepdog mailing list