[stgt] Write-cache in tgtd during a target-host crash

Wed Apr 21 11:51:49 CEST 2010

In regards to your initial question,   what happens to data written to
tgtd that has not yet been destaged to stable storage?
That data will be lost and the initiator will not resend it.
Once a target says "i wrote it", its too late to come back later and
say "i changed my mind".

I think that when you change the mode page like this,  it does not
change the behaviour of tgtd but only of what
tgtd will respond when a initiator does a mode sense for that page.

For this mode page, and tgtd in general, I think it would be useful if
one could set the caching on/off via the mode select command like this
to toggle O_SYNC/O_DIRECT.

It would also be very useful to tell tgtd to use either or both of
O_SYNC/O_DIRECT for all i/o depending on the semantics of the
filesystem used to hos the lun files.

Tomo, would a patch that allows setting the O_DIRECT/O_SYNC flags on
the command line when invoking tgtd be acceptable?

tgtd -o O_SYNC     tgtd -o O_DIRECT      tgtd -o O_SYNC|O_DIRECT

The appropriate flags in the caching mode page could be intercepted
and use to toggle these fd settings.

My filesystem of choice would likely benefit greatly from O_DIRECT.

regards
ronnie sahlberg

On Wed, Apr 21, 2010 at 7:22 PM, Chris Webb <chris at arachsys.com> wrote:
> FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> writes:
>
>> On Wed, 14 Apr 2010 09:35:38 +0100
>> Chris Webb <chris at arachsys.com> wrote:
>>
>> > Will the initiator have retained the data that hadn't reached disk and
>> > understand that it needs to resend, or will the volume end up corrupted with
>> > the initiator's page cache not matching the real content on the disk?
>>
>> I think that it depends on what you run on the initiator. For example,
>> when many file systems (such as ext3) hits a nexus loss (the target
>> crashes), makes the disk offline. Then the page cache on the initiator
>> will be lost.
>>
>> Note that data corruption is different from data loss.
>
> Hi. I'm accessing the iscsi block device directly on the initiator, not via
> a filesystem. (It's actually a qemu using the block device as a disk.)
>
> I currently have the write cache turned off on the target and
> node.session.timeo.replacement_timeout = 86400 on the initiator. The
> behaviour at present is that, if I crash or reboot the target, IO to the
> block device on the initiator is paused, and when the target comes back,
> the initiator logs in again and IO seamlessly resumes.
>
> My worry is that, if I turn on write caching on the target, there isn't an
> obvious way for IO to resume without corrupting the content of the block
> device because in-flight IO has been lost with the target crash but
> subsequent IO will then succeed. The writes that hadn't been flushed to disk
> will be silently lost without the qemu on the initiator knowing anything is
> wrong.
>
> Am I misunderstanding how this works, though?
>
> Cheers,
>
> Chris.
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html