[Stgt-devel] data corruption problems with stgt (aborted journal, remounting ro)?
Sat Feb 2 10:39:02 CET 2008
FUJITA Tomonori schrieb:
> On Fri, 01 Feb 2008 13:05:41 +0100
> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
>> Tomasz Chmielewski schrieb:
>>> Doesn't look my posts get to this list... or is it just lagged a lot?
>>> Perhaps I'm doing something wrong - but with stgt I'm facing problems I
>>> didn't have with IET or SCST.
>>> Whenever I kill tgtd daemon and start it again (i.e., target server
>>> restart), the initiator detects an aborted journal and remount the
>>> device ro.
>>> Why is it so?
>>> What is the recommended way to kill the tgtd daemon? It doesn't seem to
>>> react on TERM signal.
>> Hello, anyone there?
>> Is there a way to restart tgtd daemon or a machine running tgtd, so that
>> iSCSI connections don't break?
> What does your 'restart tgtd daemon' mean? For me, 'restart' involves
> stopping tgtd daemon and it closes all the iSCSI connections.
Stop it, and start again?
Imagine you want to upgrade your tgtd daemon, a kernel running on that
machine, or you have to restart the target machine for some other reason
(i.e. your target machine died).
With IET or SCST there is no problem with that - stop the target, and
iSCSI initiator will try to reconnect.
By default, open-iscsi tries to reconnect for 120 seconds without
returning an error to the SCSI layer, as defined /etc/iscsi/iscsid.conf:
node.session.timeo.replacement_timeout = 120
But there is no problem to increase that value to even a couple of days
(imagine: admin stops the target machine erroneously on Friday, comes
back to work on Monday, starts the target, and initiators continue to
work as if nothing happened - processes were in an uninterruptible sleep
state, waiting for I/O operations to complete).
With tgtd, it seems impossible to me - or am I wrong?
More information about the stgt