[sheepdog] [PATCH v2 1/2] sheep: introduce journal file to boost IO performance
Liu Yuan
namei.unix at gmail.com
Mon Nov 12 04:35:01 CET 2012
On 11/12/2012 03:08 AM, MORITA Kazutaka wrote:
> This requires too much memory if we specify a big journal size
> (e.g. several giga bytes or more). We should use fixed size buffer
> (e.g. 4 MB) and iterate xpwrite to fill the file with zero bytes.
>
> However, I wonder again if this zero-fill is really beneficial. I've
> tested this patch with 4 GB journal size, and I've encountered the
> following problems.
> 1. It takes 50 seconds for sheep to start up.
> 2. There is a big performance penalty to VM I/Os while sheep fills
> zero bytes to journal files. (This happens when switching journal
> files)
>
> Does this zero-filling bring us more benefits than those problems?
>
It seems that Nope. I decide to switch to 'truncate & fallocate'.
>
>> > + if (!buf)
>> > + panic("%m\n");
>> > + memset(buf, 0, jfile_size);
>> > + wlen = xpwrite(fd, buf, jfile_size, 0);
>> > + if (wlen != jfile_size) {
>> > + eprintf("WARN: failed, %m\n");
>> > + return -1;
>> > + }
>> > +
>> > + free(buf);
>> > + return 0;
>> > +}
> (snip)
>
>> > +int journal_file_write(uint64_t oid, const char *buf, size_t size,
>> > + off_t offset, bool create)
>> > +{
>> > + struct journal_descriptor jd;
> jd should be initialized with zero bytes to avoid writing
> uninitialized data to the journal file.
>
>
>> > + uint32_t marker = JOURNAL_END_MARKER;
>> > + int ret = SD_RES_SUCCESS;
>> > + ssize_t written, rusize = roundup(size, SECTOR_SIZE),
>> > + wsize = JOURNAL_META_SIZE + rusize;
>> > + off_t woff;
>> > + char *wbuffer, *p;
>> > +
>> > + jd.magic = JOURNAL_DESC_MAGIC;
>> > + jd.offset = offset;
>> > + jd.size = size;
>> > + jd.oid = oid;
>> > + jd.create = create;
>> > +
>> > + pthread_spin_lock(&jfile_lock);
>> > + if (!jfile_enough_space(wsize))
>> > + switch_journal_file();
>> > + woff = jfile.pos;
>> > + jfile.pos += wsize;
>> > + pthread_spin_unlock(&jfile_lock);
>> > +
>> > + p = wbuffer = valloc(wsize);
>> > + if (!wbuffer)
>> > + panic("%m\n");
>> > + memcpy(p, &jd, JOURNAL_DESC_SIZE);
>> > + p += JOURNAL_DESC_SIZE;
>> > + memcpy(p, buf, rusize);
>> > + p += rusize;
> The size of buf can be smaller than rusize when size is not
> SECTOR_SIZE aligned. This memcpy should be something like as follows.
>
wsize = JOURNAL_META_SIZE + rusize, it will be never smaller than rusize
and always SECTOR_SIZE aligned. no?
Thanks,
Yuan
More information about the sheepdog
mailing list