[stgt] Resolved: Re: iostat shows all tgt I/O in 512 byte operations... how to coalesce?

Tue Sep 23 18:52:43 CEST 2008

This is resolved:  When strapped for memory, the VM fragments the
reads/writes.  I bypassed the VM by adding O_DIRECT in the
backed_file_open call in backed_file_open (in bs_rdwr.c)... and the
performance went up as expected.

Chris
On Tue, Sep 23, 2008 at 7:59 AM, Chris Worley <worleys at gmail.com> wrote:
> On Tue, Sep 23, 2008 at 2:44 AM, FUJITA Tomonori
> <fujita.tomonori at lab.ntt.co.jp> wrote:
>> On Mon, 22 Sep 2008 10:21:32 -0600
>> "Chris Worley" <worleys at gmail.com> wrote:
>>
>>> On Fri, Sep 19, 2008 at 7:40 PM, Chris Worley <worleys at gmail.com> wrote:
>>> >
>>> > I'm running CentOS 5.2 targets w/ a 2.6.24 kernel.  The initiator is
>>> > Win2003.  On the initiator side, the fs is formated NTFS w/ a 4K block
>>> > size (and the NTFS block size seems to have nothing to do w/ this
>>> > issue).
>>> >
>>> > Watching iostat on the target side, everything is being written to the
>>> > underlying disk in 512 byte operations.
>>> >
>>> > Best I can tell, it's the Linux side that's fragmenting the I/O.
>>> >
>>> > I could get a lot better performance if these were coalesced into
>>> > larger, variable, block sizes (i.e. what's being written from the
>>> > initiator side is much larger blocks).
>>> >
>>> > Is there something tgtd queries on the disk to get this information?
>>
>> tgtd doesn't do anything special. It opens a file on your file system
>> (or a device file such as /dev/sda) and performs read/write system
>> calls.
>>
>>
>>> > I don't see an fstat64 use of st_blksize in the source.
>>> >
>>> > I can put a dummy md "linear" device atop the disk and set the MD
>>> > device's chunk size to 4K... then everything to the MD device (as well
>>> > as to the underlying disk) is passed in 4K blocks... which performs
>>> > much better (except even larger blocks would get better performance if
>>> > the user is writing larger blocks... and smaller blocks do a
>>> > read-modify-write that causes 3x the IO activity to perform).
>>>
>>> I changed the MD to chunk at 8K blocks (and the NTFS on the  w2003
>>> side to use 8k blocks), and the tgtd was still chunking at 4K blocks.
>>>
>>> Does anybody have an idea where the fragmenting is occurring and/or
>>> how to stop it?
>>
>> Not sure, but I think that the problem looks more generic one, not
>> specific to tgtd, right?
>>
>
> You're right in that it looks like tgtd gets a WRITE_10 command in
> bs_rdwr_request (in bs_rdwr.c), and calls pwrite64 w/ 64K lengths...
> and gets 64K returns.
>
> So, I'm not sure whose fragmenting the I/O operation... but it is
> getting fragmented.
>
> But, iostat shows those are getting chunked somewhere, i.e., the md
> device shows 4K chunking (where [read-B/s+write-B/s]/tps == 4096):
>
> Device:            tps      MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> md0           16621.20        10.55        54.38               52            271
>
> ... I've looked into the pwrite64 kernel syscall, and it calls
> vfs_write (in linux/fs/read_write.c) which could take two paths: 1)
> do_sync_write (which won't chunk the call) or 2) a callback in the
> pointer file->f_op->write.  I don't have a clue who might fill in the
> callback, or what might be getting called there.
>
> I've not ever seen problems w/ pwrite/pread... so I'm very perplexed
> as to why this is getting fragmented.
>
> Chris
>
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html