[sheepdog] [PATCH v2 1/2] sheep: introduce journal file to boost IO performance

Liu Yuan namei.unix at gmail.com
Mon Nov 12 04:35:01 CET 2012


On 11/12/2012 03:08 AM, MORITA Kazutaka wrote:
> This requires too much memory if we specify a big journal size
> (e.g. several giga bytes or more).  We should use fixed size buffer
> (e.g. 4 MB) and iterate xpwrite to fill the file with zero bytes.
> 
> However, I wonder again if this zero-fill is really beneficial.  I've
> tested this patch with 4 GB journal size, and I've encountered the
> following problems.
>  1. It takes 50 seconds for sheep to start up.
>  2. There is a big performance penalty to VM I/Os while sheep fills
>     zero bytes to journal files. (This happens when switching journal
>     files)
> 
> Does this zero-filling bring us more benefits than those problems?
> 

It seems that Nope. I decide to switch to 'truncate & fallocate'.

> 
>> > +	if (!buf)
>> > +		panic("%m\n");
>> > +	memset(buf, 0, jfile_size);
>> > +	wlen = xpwrite(fd, buf, jfile_size, 0);
>> > +	if (wlen != jfile_size) {
>> > +		eprintf("WARN: failed, %m\n");
>> > +		return -1;
>> > +	}
>> > +
>> > +	free(buf);
>> > +	return 0;
>> > +}
> (snip)
> 
>> > +int journal_file_write(uint64_t oid, const char *buf, size_t size,
>> > +		       off_t offset, bool create)
>> > +{
>> > +	struct journal_descriptor jd;
> jd should be initialized with zero bytes to avoid writing
> uninitialized data to the journal file.
> 
> 
>> > +	uint32_t marker = JOURNAL_END_MARKER;
>> > +	int ret = SD_RES_SUCCESS;
>> > +	ssize_t written, rusize = roundup(size, SECTOR_SIZE),
>> > +		wsize = JOURNAL_META_SIZE + rusize;
>> > +	off_t woff;
>> > +	char *wbuffer, *p;
>> > +
>> > +	jd.magic = JOURNAL_DESC_MAGIC;
>> > +	jd.offset = offset;
>> > +	jd.size = size;
>> > +	jd.oid = oid;
>> > +	jd.create = create;
>> > +
>> > +	pthread_spin_lock(&jfile_lock);
>> > +	if (!jfile_enough_space(wsize))
>> > +		switch_journal_file();
>> > +	woff = jfile.pos;
>> > +	jfile.pos += wsize;
>> > +	pthread_spin_unlock(&jfile_lock);
>> > +
>> > +	p = wbuffer = valloc(wsize);
>> > +	if (!wbuffer)
>> > +		panic("%m\n");
>> > +	memcpy(p, &jd, JOURNAL_DESC_SIZE);
>> > +	p += JOURNAL_DESC_SIZE;
>> > +	memcpy(p, buf, rusize);
>> > +	p += rusize;
> The size of buf can be smaller than rusize when size is not
> SECTOR_SIZE aligned.  This memcpy should be something like as follows.
> 

wsize = JOURNAL_META_SIZE + rusize, it will be never smaller than rusize
and always SECTOR_SIZE aligned. no?

Thanks,
Yuan





More information about the sheepdog mailing list