[Stgt-devel] [Scst-devel] Integration of SCST in the mainstream Linux kernel

Thu Feb 7 01:20:30 CET 2008

On Wed, 06 Feb 2008 07:58:57 +0100
Tomasz Chmielewski <mangoo at wpkg.org> wrote:

> FUJITA Tomonori schrieb:
> 
> (...)
> 
> >> Anyway - if tgtd can be only killed with KILL signal - isn't it risky to 
> >> do so? For example, something may be still cached, not full transferred? 
> >> Will SCSI stack take care of it properly?
> > 
> > No risk. It's just like power failure. The file system on the
> > initiator machine just sees the I/O failure. It will not break your
> > file system. With ext3, if power failure happens during a transaction,
> > it just aborts the transaction.
> 
> Well, if the initiator sees any failure, it is a risk.

Then your definition of 'a risk' is different from mine.

For me, I/O failure (due to an unexpected crash of a target) is not a
risk since it doesn't corrupt my file system (of course, I prefer not
see I/O failure). I think, in general, in a storage world, corruption
refers to data corruption stored in a file system (or something like
database).

> I have a target machine which used to freeze about once a week when I 
> used IET. No kernel logs, no oops, no panic - just a freeze.
> 
> Whenever that happened, I used to just restart that frozen target 
> machine, and initiators resumed their work without any failure (and 
> filesystems didn't even know SCSI layer had problems with reading and 
> writing for the whole night). With write cache enabled this could be 
> different of course.

Again, from the perspective of file systems, write cache doesn't
matter. Even with write cache, I/O failure doesn't corrupt file
systems.

> Curiously, this machine doesn't freeze when I use SCST or STGT.
> 
> So it's good to know a clean tgtd shutdown is somewhere on a TO DO list.
> 
> 
> >>> You want to reboot a server running target devices while initiators
> >>> connect to it. Rebooting the target server behind the initiators
> >>> seldom works. System adminstorators in my workplace reboot storage
> >>> devices once a year and tell us to shut down the initiator machines
> >>> that use them before that.
> >> Technically, it should always work (assuming the timeout on the 
> >> initiators is bigger than time the target server reboot takes).
> >> By default, for open-iscsi, timeout is 120 seconds, but in practice 
> >> there is no problem in increasing it to even many days.
> > 
> > Well, what happens if people try to shut down the initiator box while
> > the target server is down?
> 
> The same should happen as if the initiator is running normally - tasks 
> which want to read or write to disk will be in a uninterruptible sleep 
> until the initiator can connect again (and continue the shutdown process).

For me, uninterruptible sleep is really bad. I prefer to have I/O
failure ealier.

> > It might work only in an environment which you control everything.
> 
> True, but I guess that's the most of usage scenarios for iSCSI?
> iSCSI target is not a web server in anonymous internet, where anyone can 
> connect.

For me, no. Commonly, the initiators provides some service to users
(that is, they are web, mail servers, or something eles). So if you
shut down the target server behind the initiators, for example, the
http daemon stops uninterruptibly and gives no response to the
users. As a user, I prefer to see an error after expected timeout than
waiting for an unexpected time and guessing what's wrong.

Anyway, how you manage your systems doesn't matter for me. So it's the
time for me to fix the stuff instead of discussion, I guess.